ShenLab-Genomics / Anchored-Fusion

GNU General Public License v3.0
1 stars 1 forks source link

The program reports an error #1

Open QiQi77777111 opened 5 months ago

QiQi77777111 commented 5 months ago

![Uploading 微信图片_20240401165555.png…]() Excuse me. I want to answer you a question. How can I solve the error shown in the picture?

QiQi77777111 commented 5 months ago

I'm encountering an error that says :ModuleNotFoundError: No module named 'Model’ How can I fix this?

withermatt commented 2 months ago

I have the same issue. Also line 4 of functions.py. Is 'Bio' Biopython?

Sogand65 commented 1 month ago

Hi,

Thanks for the package you provided, I want to use this for my single cell ALL dataset. I started with your test samples. However there are some errors in the Anchored_Fusion.py as above mentioned. so far I got three 1-the "Model" error, 2- no name "np" which can be handled by changing the lines in the Anchored_Fusion.py as below, but I could not figure out the #3: 1- from Model import Train_model, Test_model --> change to: from model import Train_model, Test_model 2- import numpy --> change to import numpy as np However, after resolving these I am getting another error: 3- python Anchored_Fusion.py --file_anchored_cds=test/target_gene.fasta --fastq1=test/test_sample_1.fastq.gz --fastq2=test/test_sample_1.fastq.gz --out_folder=output_dir --file_ref_seq=../scFusion/data/hg19.fa --file_ref_ann=../scFusion/data/gencode.v46lift37.annotation.gtf Traceback (most recent call last): File "/Users/sssajedi/Library/CloudStorage/OneDrive-InsideMDAnderson/proj/Anchored-Fusion/Anchored_Fusion.py", line 89, in gene_co.Build_dic(args.file_ref_ann) File "/Users/sssajedi/Library/CloudStorage/OneDrive-InsideMDAnderson/proj/Anchored-Fusion/functions.py", line 17, in Build_dic if arr[2] == "exon":


IndexError: list index out of range

I really appreciate if you could help us with those errors!

Thanks,
Sogi
Tcooler commented 4 weeks ago

Hi,

Thanks for the package you provided, I want to use this for my single cell ALL dataset. I started with your test samples. However there are some errors in the Anchored_Fusion.py as above mentioned. so far I got three 1-the "Model" error, 2- no name "np" which can be handled by changing the lines in the Anchored_Fusion.py as below, but I could not figure out the #3: 1- from Model import Train_model, Test_model --> change to: from model import Train_model, Test_model 2- import numpy --> change to import numpy as np However, after resolving these I am getting another error: 3- python Anchored_Fusion.py --file_anchored_cds=test/target_gene.fasta --fastq1=test/test_sample_1.fastq.gz --fastq2=test/test_sample_1.fastq.gz --out_folder=output_dir --file_ref_seq=../scFusion/data/hg19.fa --file_ref_ann=../scFusion/data/gencode.v46lift37.annotation.gtf Traceback (most recent call last): File "/Users/sssajedi/Library/CloudStorage/OneDrive-InsideMDAnderson/proj/Anchored-Fusion/Anchored_Fusion.py", line 89, in gene_co.Build_dic(args.file_ref_ann) File "/Users/sssajedi/Library/CloudStorage/OneDrive-InsideMDAnderson/proj/Anchored-Fusion/functions.py", line 17, in Build_dic if arr[2] == "exon": ~~~^^^ IndexError: list index out of range

I really appreciate if you could help us with those errors!

Thanks, Sogi

This is because the gtf file you use is in a different format than we use. You can try the file gencode.v42.chr_patch_hapl_scaff.annotation.gtf, Download site is https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/gencode.v42.chr_patch_hapl_scaff.annotation.gtf.gz In addition, thank you for your correction. We have fixed the problem you mentioned.

Tcooler commented 4 weeks ago

I have the same issue. Also line 4 of functions.py. Is 'Bio' Biopython?

Yes, thanks for the correction, we have added information about this package to the readme.

Tcooler commented 4 weeks ago

I'm encountering an error that says :ModuleNotFoundError: No module named 'Model’ How can I fix this?

We fixed this issue in an update, renaming the model.py file to Model.py, and you can download all the files again.

If you think it's a bit of a hassle, You can also change line 5 of the file Anchored_Fusion.py from 'from Model import Train_model, Test_model' to' from model import Train_model, Test_model', Line 7 is changed to 'import numpy as np', line 5 of file 'Anchored_Fusion_singlecell.py' is changed to 'from model import Train_model, Test_model', Line 7 is changed to 'import numpy as np'.

Sogand65 commented 4 weeks ago

Hi Tcooler,

Thanks for the corrections and your response. I have the two following questions: 1-for the .fa reference genome I assume I should download the corresponding version which is Genome sequence (GRCh38.p13.genome.fa)? 2- I am new to fusion analysis, and I appreciate if you can provide me some information on the output_dir, I could not find any info in readme document. how can we proceed further with gene expression data that contains all genes as well as fusion genes and cells info?

Regarding my q2 I run the test data and got: python Anchored_Fusion.py --file_anchored_cds=test/target_gene.fasta --fastq1=test/test_sample_1.fastq.gz --fastq2=test/test_sample_1.fastq.gz --out_folder=output_dir --file_ref_seq=../RefData/refGenome_GRCH38.p13/GRCh38.p13.genome.fa --file_ref_ann=../RefData/refGenome_GRCH38.p13/gencode.v42.chr_patch_hapl_scaff.annotation.gtf Error: positive samples file not found!, not performing filter false positives. [bwa_index] Pack FASTA... 0.00 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.00 seconds elapse. [bwa_index] Update BWT... 0.00 sec [bwa_index] Pack forward-only FASTA... 0.00 sec [bwa_index] Construct SA from BWT and Occ... 0.00 sec [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa index output_dir/BCR_fusion/work_dir/BCR_fusion_anchored_gene_sequence.fa [main] Real time: 0.009 sec; CPU: 0.008 sec [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 22516 sequences (2274116 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] skip orientation FR as there are not enough pairs [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 22516 reads in 0.374 CPU sec, 0.374 real sec [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa mem -M -t 1 output_dir/BCR_fusion/work_dir/BCR_fusion_anchored_gene_sequence.fa test/test_sample_1.fastq.gz test/test_sample_1.fastq.gz [main] Real time: 0.412 sec; CPU: 0.397 sec [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 0 reads [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 0 reads [M::bwa_idx_load_from_disk] read 0 ALT contigs [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa mem -M -t 1 ../RefData/refGenome_GRCH38.p14/GRCh38.p14.genome.fa output_dir/BCR_fusion/work_dir/BCR_fusion_tmp_1.fastq output_dir/BCR_fusion/work_dir/BCR_fusion_tmp_2.fastq [main] Real time: 0.944 sec; CPU: 0.856 sec Loaded 3291585349 letters in 706 sequences Searched 6783 bases in 1 sequences ***** WARNING: File output_dir/BCR_fusion/work_dir/BCR_fusion_tmp_bed.bed has inconsistent naming convention for record: KI270731.1 867 7223 BCR +

***** WARNING: File output_dir/BCR_fusion/work_dir/BCR_fusion_tmp_bed.bed has inconsistent naming convention for record: KI270731.1 867 7223 BCR +

[M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 108 sequences (10908 bp)... [M::mem_process_seqs] Processed 108 reads in 0.007 CPU sec, 0.007 real sec [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa mem -M -t 1 ../RefData/refGenome_GRCH38.p14/GRCh38.p14.genome.fa output_dir/BCR_fusion/work_dir/BCR_fusion_del_tmp.fa [main] Real time: 0.732 sec; CPU: 0.726 sec Loaded 3291585349 letters in 706 sequences Searched 5454 bases in 54 sequences Loaded 268 letters in 1 sequences Searched 6783 bases in 1 sequences [bwa_index] Pack FASTA... 0.00 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.00 seconds elapse. [bwa_index] Update BWT... 0.00 sec [bwa_index] Pack forward-only FASTA... 0.00 sec [bwa_index] Construct SA from BWT and Occ... 0.00 sec [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa index output_dir/BCR_fusion/work_dir/BCR_fusion_candidate_gene_sequence.fa [main] Real time: 0.009 sec; CPU: 0.006 sec Loaded 268 letters in 1 sequences Searched 273 bases in 10 sequences Loaded 268 letters in 1 sequences Searched 68 bases in 1 sequences Loaded 6783 letters in 1 sequences Searched 234 bases in 3 sequences Loaded 3291585349 letters in 706 sequences Searched 362 bases in 3 sequences

It only took 4 min, and there is an Error message "Error: positive samples file not found!, not performing filter false positives." however it continued and I got the following outputs: the model directory is empty since I have been using the existed model I guess and the gene directory is: ls BCR_fusion/ BCR_fusion_predictions.txt model_dir BCR_fusion_predictions_abridged.txt work_dir

Does this mean that the run was successful? Thank you!

bests, Sogi

Sogand65 commented 3 weeks ago

Hi,

I still get errors running the test data even with the same instructions you provided on the redme, I really appreciate if you could help me with solving the problem. I got the error when using your model on data/model.pt

python Anchored_Fusion.py --file_anchored_cds=test/target_gene.fasta --fastq1=test/test_sample_1.fastq.gz --fastq2=test/test_sample_1.fastq.gz --out_folder=output_dir --file_ref_seq=../RefData/refGenome_GRCH38.p14/GRCh38.p14.genome.fa --file_ref_ann=../RefData/refGenome_GRCH38.p14/gencode.v42.chr_patch_hapl_scaff.annotation.gtf --positive_samples=./data/positive_seq.txt --model_file=./data/model.pt

[M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 22516 sequences (2274116 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] skip orientation FR as there are not enough pairs [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 22516 reads in 2.250 CPU sec, 2.250 real sec [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa mem -M -t 1 ../RefData/refGenome_GRCH38.p14/GRCh38.p14.genome.fa test/test_sample_1.fastq.gz test/test_sample_1.fastq.gz [main] Real time: 3.263 sec; CPU: 3.234 sec Loaded 3291585349 letters in 706 sequences Searched 202 bases in 2 sequences Traceback (most recent call last): File "/Users/sssajedi/Library/CloudStorage/OneDrive-InsideMDAnderson/proj/Anchored-Fusion/Anchored_Fusion.py", line 112, in Train_model(model_out_name, args.positive_samples,negative_samples,model_file,gpu_number) File "/Users/sssajedi/Library/CloudStorage/OneDrive-InsideMDAnderson/proj/Anchored-Fusion/Model.py", line 279, in Train_model X_tra = X_tra[List_tra,:,:]


IndexError: too many indices for tensor of dimension 1 
Sogand65 commented 3 weeks ago

Hi,

I tried to use your model on my data, I had 6 fastqs from seperate lanes from 1 sample for R1, and 6 from R2, I concatenated and trimmed my fastqs to total R1.fastq.gz and R2.fastq.gz and used your command accordingly as instructed but after few hours of calculations I got the following error:

Anchored-Fusion sssajedi$ python Anchored_Fusion.py --file_anchored_cds=test/target_gene.fasta --fastq1=~/My_OneDrive/proj/ALL/data/PT44A-GEX_S2_all_R1_trimmed.fastq.gz --fastq2=~/My_OneDrive/proj/ALL/data/PT44A-GEX_S2_all_R2_trimmed.fastq.gz --out_folder=output_ALL_PT44_pretrained --file_ref_seq=/Users/xxx/refGenome_Gencode_human/release_42/GRCh38.p13.genome.fa --file_ref_ann=/Users/xxx/refGenome_Gencode_human/release_42/gencode.v42.chr_patch_hapl_scaff.annotation.gtf --not_train_filter_model --model_file data/model.pt [bwa_index] Pack FASTA... 0.00 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 0.00 seconds elapse. [bwa_index] Update BWT... 0.00 sec [bwa_index] Pack forward-only FASTA... 0.00 sec [bwa_index] Construct SA from BWT and Occ... 0.00 sec [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa index output_ALL_PT44_pretrained/BCR_fusion/work_dir/BCR_fusion_anchored_gene_sequence.fa [main] Real time: 0.020 sec; CPU: 0.009 sec [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 66324 sequences (10000032 bp)... [M::process] read 66328 sequences (10000296 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] skip orientation FR as there are not enough pairs [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 66324 reads in 1.825 CPU sec, 1.767 real sec [M::process] read 66334 sequences (10000076 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 2, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] skip orientation FR as there are not enough pairs [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 66328 reads in 1.844 CPU sec, 1.763 real sec [M::process] read 66322 sequences (10000194 bp)... .... [M::mem_pestat] low and high boundaries for proper pairs: (1, 761) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 1890 reads in 1.219 CPU sec, 1.219 real sec [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa mem -M -t 1 /Users/sssajedi/My_OneDrive/proj/RefData/refGenome_Gencode_human/release_42/GRCh38.p13.genome.fa output_ALL_PT44_pretrained/BCR_fusion/work_dir/BCR_fusion_tmp_1.fastq output_ALL_PT44_pretrained/BCR_fusion/work_dir/BCR_fusion_tmp_2.fastq [main] Real time: 2.294 sec; CPU: 2.249 sec Loaded 3267117988 letters in 639 sequences Searched 6783 bases in 1 sequences ***** WARNING: File output_ALL_PT44_pretrained/BCR_fusion/work_dir/BCR_fusion_tmp_bed.bed has inconsistent naming convention for record: KI270731.1 867 7223 BCR +

***** WARNING: File output_ALL_PT44_pretrained/BCR_fusion/work_dir/BCR_fusion_tmp_bed.bed has inconsistent naming convention for record: KI270731.1 867 7223 BCR +

[M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 8162 sequences (1231213 bp)... [M::mem_process_seqs] Processed 8162 reads in 0.974 CPU sec, 0.974 real sec [main] Version: 0.7.18-r1243-dirty [main] CMD: bwa mem -M -t 1 /Users/xxx/RefData/refGenome_Gencode_human/release_42/GRCh38.p13.genome.fa output_ALL_PT44_pretrained/BCR_fusion/work_dir/BCR_fusion_del_tmp.fa [main] Real time: 1.832 sec; CPU: 1.824 sec Loaded 3267117988 letters in 639 sequences Searched 1124445 bases in 7509 sequences Traceback (most recent call last): File "/xxx/proj/Anchored-Fusion/Anchored_Fusion.py", line 205, in blocks_chr = Find_fine_block(file_anchored_reads_filter,args.file_ref_seq,out_dir_name,gene_co, homo_genes,blocks_chr) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/xxx/proj/Anchored-Fusion/functions.py", line 568, in Find_fine_block if block[2] < Block_now[j].start: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: '<' not supported between instances of 'int' and 'str'

I also checked the code using only one lane fastq files, it gave me the same error! I appreciate any input.

Best, Sogi