AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
58 stars 28 forks source link

IndexError: list index out of range #29

Closed Cathy94 closed 1 year ago

Cathy94 commented 1 year ago

Hi Jens,

When I run command:

python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref hg19 --bed /home/output//048_D.clean.sorted.dupmark.merged_CNV_CALLS_pre_filtered.bed --bam /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam --gain 4.5 --cnsize_min 50000 --out /home/output/048_AA_CNV_SEEDS

I don't get the file: 048_AA_CNV_SEEDS and get the point: Traceback (most recent call last): File "/home/programs/AmpliconArchitect-master/src/amplified_intervals.py", line 176, in if float(a.info[-1]) * a.segdup_uniqueness() > GAIN and a.rep_content() < 2.5: File "/home/programs/AmpliconArchitect-master/src/ref_util.py", line 396, in rep_content m = interval(duke35[p]) File "/home/programs/AmpliconArchitect-master/src/ref_util.py", line 149, in init self.load_line(line, file_format, exclude_info_string=exclude_info_string) File "/home/programs/AmpliconArchitect-master/src/ref_util.py", line 201, in load_line self.start, self.end = sorted([int(float(ll[1])), int(float(ll[2]))]) IndexError: list index out of range amplified_intervals.py returned a non-zero exit code. Exiting...

I don't how to solve this error. Any assistance would be appreciated Best, Shuang

jluebeck commented 1 year ago

Hi, would you be able to share the input bed file "048_D.clean.sorted.dupmark.merged_CNV_CALLS_pre_filtered.bed"? If you would like, you can email it to me at jluebeck [at] ucsd.edu. I suspect it is the reason for the issue. Was it generated using PrepareAA.py?

Thank you, Jens

Cathy94 commented 1 year ago

Thank you for your reply. The input bed file was generated using PrepareAA.py. I have send email to you. Thank you for your help.

Best, Shuang

jluebeck commented 1 year ago

Hi Shuang, unfortunately it seems I did not receive your email (didn't see it in spam folder either). Perhaps instead can you try jluebeck [at] eng.ucsd.edu?

Thanks!

Cathy94 commented 1 year ago

Dear Jens,

Thank you for your help. This is my 'CNV_CALLS_pre_filtered' and 'PAA_stduot.log'.

I generated it using AmoliconSuite-pipeline docker, and the commands is: AmpliconSuite-pipeline/docker/run_paa_docker.py -o output_2 -s 048 -t 8 --bam bam/048_D.clean.sorted.dupmark.merged.bam --run_AA --run_AC --ref hg19 --run_as_user However, the output file '048_finish_flag' point out 'UNSUCCESSFUL'.

Looing for your reply, thank you again.

Best wishes, Shuang

赵爽

@. | ---- Replied Message ---- | From | Jens @.> | | Date | 1/1/2023 01:12 | | To | @.> | | Cc | @.> , State @.***> | | Subject | Re: [jluebeck/AmpliconSuite-pipeline] IndexError: list index out of range (Issue #29) |

Hi Shuang, unfortunately it seems I did not receive your email (didn't see it in spam folder either). Perhaps instead can you try jluebeck [at] eng.ucsd.edu?

Thanks!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.**> /usr/local/lib/python3.8/dist-packages/skgenome/intersect.py:11: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import Int64Index CNVkit 0.9.9 Wrote /home/output/048_cnvkit_output/hg19_cnvkit_filtered_ref.target-tmp.bed with 565907 regions Wrote /home/output/048_cnvkit_output/hg19_cnvkit_filtered_ref.antitarget-tmp.bed with 0 regions Running 1 samples in 8 processes (that's 8 processes per bam) Running the CNVkit pipeline on /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam ... Processing reads in 048_D.clean.sorted.dupmark.merged.bam Time: 1388.561 seconds (2169305 reads/sec, 408 bins/sec) Summary: #bins=565907, #reads=3012211119, mean=5322.8024, min=0.0, max=519344.6129032258 Percent reads in regions: 126.582 (of 2379650319 mapped) Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.targetcoverage.cnn with 565907 regions Skip processing 048_D.clean.sorted.dupmark.merged.bam with empty regions file /home/output/048_cnvkit_output/hg19_cnvkit_filtered_ref.antitarget-tmp.bed Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.antitargetcoverage.cnn with 0 regions Processing target: 048_D.clean.sorted.dupmark.merged Keeping 563846 of 565907 bins Correcting for GC bias... Processing antitarget: 048_D.clean.sorted.dupmark.merged Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cnr with 563846 regions Segmenting /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cnr ... Segmenting with method 'cbs', significance threshold 1e-06, in 8 processes Smoothing overshot at 3 / 2012 indices: (-26.679194769204386, -1.7267632042523196) vs. original (-25.97306178037291, 0.2231772587569223) Smoothing overshot at 3 / 3232 indices: (-26.456721184453123, -0.5195648555922598) vs. original (-25.972475165010884, 0.602736088687462) Post-processing /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cns ... Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cns with 567 regions Applying filter 'ci' Filtered by 'ci' from 567 to 261 rows Calling copy number with thresholds: -1.1 => 0, -0.25 => 1, 0.2 => 2, 0.7 => 3 Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.call.cns with 261 regions Significant hits in 14026/563846 bins (2.49%) Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.bintest.cns with 14026 regions /usr/local/lib/python3.8/dist-packages/skgenome/intersect.py:11: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import Int64Index Segmenting with method 'cbs', significance threshold 0.0001, in 8 processes Smoothing overshot at 3 / 2012 indices: (-26.679195491987212, -1.7267644403219062) vs. original (-25.9731, 0.223177) Smoothing overshot at 3 / 3232 indices: (-26.456716583601278, -0.5195647182235745) vs. original (-25.9725, 0.602736) Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cns with 598 regions rm: cannot remove '/home/output/048_cnvkit_output//target.bed': No such file or directory Global ref name is hg19 Traceback (most recent call last): File "/home/programs/AmpliconArchitect-master/src/amplified_intervals.py", line 176, in if float(a.info[-1]) * a.segdup_uniqueness() > GAIN and a.rep_content() < 2.5: File "/home/programs/AmpliconArchitect-master/src/ref_util.py", line 396, in rep_content m = interval(duke35[p]) File "/home/programs/AmpliconArchitect-master/src/ref_util.py", line 149, in init self.load_line(line, file_format, exclude_info_string=exclude_info_string) File "/home/programs/AmpliconArchitect-master/src/ref_util.py", line 201, in load_line self.start, self.end = sorted([int(float(ll[1])), int(float(ll[2]))]) IndexError: list index out of range amplified_intervals.py returned a non-zero exit code. Exiting... 2022-12-30 12:24:31.094816 PrepareAA version 0.1344.1

Matched /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam to reference genome hg19 Running PrepareAA on sample: 048 /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam: 2298508762 + 0 properly paired (96.38% : N/A)

Running CNVKit batch python3 /home/programs/cnvkit.py batch -m wgs -r /home/data_repo/hg19/hg19_cnvkit_filtered_ref.cnn -p 8 -d /home/output/048_cnvkit_output/ /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam

Running CNVKit segment python3 /home/programs/cnvkit.py segment /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cnr -p 8 -m cbs -o /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cns

Cleaning up temporary files rm /home/output/048_cnvkit_output//tmp.bed /home/output/048_cnvkit_output//.cnn /home/output/048_cnvkit_output//target.bed /home/output/048_cnvkit_output//.bintest.cns gzip -f /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cnr

Running amplified_intervals python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref hg19 --bed /home/output//048_D.clean.sorted.dupmark.merged_CNV_CALLS_pre_filtered.bed --bam /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam --gain 4.5 --cnsize_min 50000 --out /home/output/048_AA_CNV_SEEDS

Cathy94 commented 1 year ago

Hi Shuang, unfortunately it seems I did not receive your email (didn't see it in spam folder either). Perhaps instead can you try jluebeck [at] eng.ucsd.edu?

Thanks!

Hi, sorry about this. I sent email again. Thank you for your patience. Happy new year!

jluebeck commented 1 year ago

Hi Shuang,

Thank you for sending the bed file. I was not able to reproduce this error locally. When running the same commands using the bed file on a testing bam file, it finished without error.

I believe the issue is probably because AA data repo is not configured properly. Could you please send the result of ls $AA_DATA_REPO ls $AA_DATA_REPO/hg19

Cathy94 commented 1 year ago

Hi Jens, Thank you for your reply. The AA data repo includes some other files, such as bam and output files. I removed these files and ran the commands again, but it finished with the same error.

Now the AA data repo is : ls $AA_DATA_REPO coverage.stats hg19 hg19_indexed.tar.gz hg19.tar.gz

ls $AA_DATA_REPO/hg19 annotations file_list.txt hg19full.fa.amb hg19full.fa.pac human_hg19_september_2011 cancer hg19_centromere.bed hg19full.fa.ann hg19full.fa.sa last_updated.txt conserved.bed hg19_cnvkit_filtered_ref.cnn hg19full.fa.bwt hg19_merged_centromeres_conserved_sorted.bed wgEncodeDukeMapabilityUniqueness35bp_sorted.bedGraph dummy_ploidy.vcf hg19full.fa

The files in output include: docker_home_manifest.log 048_D.clean.sorted.dupmark.merged_CNV_CALLS_pre_filtered.bed 048_outputs.tar.gz PAA_stdout.log 048_cnvkit_output 048_finish_flag.txt

赵爽

@. | ---- Replied Message ---- | From | Jens @.> | | Date | 1/10/2023 04:55 | | To | @.> | | Cc | @.> , State @.***> | | Subject | Re: [jluebeck/AmpliconSuite-pipeline] IndexError: list index out of range (Issue #29) |

Hi Shuang,

Thank you for sending the bed file. I was not able to reproduce this error locally. When running the same commands using the bed file on a testing bam file, it finished without error.

I believe the issue is probably because AA data repo is not configured properly. Could you please send the result of ls $AA_DATA_REPO ls $AA_DATA_REPO/hg19

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

jluebeck commented 1 year ago

Thanks, the data repo appears to be correctly configured, but perhaps there is some issue with the contents of the data repo.

Did you delete the data repo itself and re-download? Can you check to make sure if the file wgEncodeDukeMapabilityUniqueness35bp_sorted.bedGraph is the same as what is provided in the data repo online? The latest version of the docker image will download the appropriate data repo for you - but you need to remove your local version first so that it does not re-use it. That may be a good way to test for the issue.

I think you have sent me the contents of PAA_stdout.log, but are you able to share the stdout printed to the terminal when running the run_paa_docker.py script?

Lastly, is your bam file coordinate sorted? Is it aligned to hg19 or is it aligned instead to GRCh37?

Thanks, Jens

Cathy94 commented 1 year ago

Thank you very much! I re-download the data repo, and finished the pipeline successfully.

Best wishes, Shuang