Closed Cathy94 closed 1 year ago
Hi, would you be able to share the input bed file "048_D.clean.sorted.dupmark.merged_CNV_CALLS_pre_filtered.bed"? If you would like, you can email it to me at jluebeck [at] ucsd.edu. I suspect it is the reason for the issue. Was it generated using PrepareAA.py?
Thank you, Jens
Thank you for your reply. The input bed file was generated using PrepareAA.py. I have send email to you. Thank you for your help.
Best, Shuang
Hi Shuang, unfortunately it seems I did not receive your email (didn't see it in spam folder either). Perhaps instead can you try jluebeck [at] eng.ucsd.edu?
Thanks!
Dear Jens,
Thank you for your help. This is my 'CNV_CALLS_pre_filtered' and 'PAA_stduot.log'.
I generated it using AmoliconSuite-pipeline docker, and the commands is: AmpliconSuite-pipeline/docker/run_paa_docker.py -o output_2 -s 048 -t 8 --bam bam/048_D.clean.sorted.dupmark.merged.bam --run_AA --run_AC --ref hg19 --run_as_user However, the output file '048_finish_flag' point out 'UNSUCCESSFUL'.
Looing for your reply, thank you again.
Best wishes, Shuang
赵爽 | |
---|---|
@. | ---- Replied Message ---- | From | Jens @.> | | Date | 1/1/2023 01:12 | | To | @.> | | Cc | @.> , State @.***> | | Subject | Re: [jluebeck/AmpliconSuite-pipeline] IndexError: list index out of range (Issue #29) |
Hi Shuang, unfortunately it seems I did not receive your email (didn't see it in spam folder either). Perhaps instead can you try jluebeck [at] eng.ucsd.edu?
Thanks!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you modified the open/close state.Message ID: @.**>
/usr/local/lib/python3.8/dist-packages/skgenome/intersect.py:11: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import Int64Index
CNVkit 0.9.9
Wrote /home/output/048_cnvkit_output/hg19_cnvkit_filtered_ref.target-tmp.bed with 565907 regions
Wrote /home/output/048_cnvkit_output/hg19_cnvkit_filtered_ref.antitarget-tmp.bed with 0 regions
Running 1 samples in 8 processes (that's 8 processes per bam)
Running the CNVkit pipeline on /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam ...
Processing reads in 048_D.clean.sorted.dupmark.merged.bam
Time: 1388.561 seconds (2169305 reads/sec, 408 bins/sec)
Summary: #bins=565907, #reads=3012211119, mean=5322.8024, min=0.0, max=519344.6129032258
Percent reads in regions: 126.582 (of 2379650319 mapped)
Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.targetcoverage.cnn with 565907 regions
Skip processing 048_D.clean.sorted.dupmark.merged.bam with empty regions file /home/output/048_cnvkit_output/hg19_cnvkit_filtered_ref.antitarget-tmp.bed
Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.antitargetcoverage.cnn with 0 regions
Processing target: 048_D.clean.sorted.dupmark.merged
Keeping 563846 of 565907 bins
Correcting for GC bias...
Processing antitarget: 048_D.clean.sorted.dupmark.merged
Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cnr with 563846 regions
Segmenting /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cnr ...
Segmenting with method 'cbs', significance threshold 1e-06, in 8 processes
Smoothing overshot at 3 / 2012 indices: (-26.679194769204386, -1.7267632042523196) vs. original (-25.97306178037291, 0.2231772587569223)
Smoothing overshot at 3 / 3232 indices: (-26.456721184453123, -0.5195648555922598) vs. original (-25.972475165010884, 0.602736088687462)
Post-processing /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cns ...
Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cns with 567 regions
Applying filter 'ci'
Filtered by 'ci' from 567 to 261 rows
Calling copy number with thresholds: -1.1 => 0, -0.25 => 1, 0.2 => 2, 0.7 => 3
Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.call.cns with 261 regions
Significant hits in 14026/563846 bins (2.49%)
Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.bintest.cns with 14026 regions
/usr/local/lib/python3.8/dist-packages/skgenome/intersect.py:11: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import Int64Index
Segmenting with method 'cbs', significance threshold 0.0001, in 8 processes
Smoothing overshot at 3 / 2012 indices: (-26.679195491987212, -1.7267644403219062) vs. original (-25.9731, 0.223177)
Smoothing overshot at 3 / 3232 indices: (-26.456716583601278, -0.5195647182235745) vs. original (-25.9725, 0.602736)
Wrote /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cns with 598 regions
rm: cannot remove '/home/output/048_cnvkit_output//target.bed': No such file or directory
Global ref name is hg19
Traceback (most recent call last):
File "/home/programs/AmpliconArchitect-master/src/amplified_intervals.py", line 176, in
Matched /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam to reference genome hg19 Running PrepareAA on sample: 048 /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam: 2298508762 + 0 properly paired (96.38% : N/A)
Running CNVKit batch python3 /home/programs/cnvkit.py batch -m wgs -r /home/data_repo/hg19/hg19_cnvkit_filtered_ref.cnn -p 8 -d /home/output/048_cnvkit_output/ /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam
Running CNVKit segment python3 /home/programs/cnvkit.py segment /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cnr -p 8 -m cbs -o /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cns
Cleaning up temporary files rm /home/output/048_cnvkit_output//tmp.bed /home/output/048_cnvkit_output//.cnn /home/output/048_cnvkit_output//target.bed /home/output/048_cnvkit_output//.bintest.cns gzip -f /home/output/048_cnvkit_output/048_D.clean.sorted.dupmark.merged.cnr
Running amplified_intervals python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref hg19 --bed /home/output//048_D.clean.sorted.dupmark.merged_CNV_CALLS_pre_filtered.bed --bam /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam --gain 4.5 --cnsize_min 50000 --out /home/output/048_AA_CNV_SEEDS
Hi Shuang, unfortunately it seems I did not receive your email (didn't see it in spam folder either). Perhaps instead can you try jluebeck [at] eng.ucsd.edu?
Thanks!
Hi, sorry about this. I sent email again. Thank you for your patience. Happy new year!
Hi Shuang,
Thank you for sending the bed file. I was not able to reproduce this error locally. When running the same commands using the bed file on a testing bam file, it finished without error.
I believe the issue is probably because AA data repo is not configured properly. Could you please send the result of
ls $AA_DATA_REPO
ls $AA_DATA_REPO/hg19
Hi Jens, Thank you for your reply. The AA data repo includes some other files, such as bam and output files. I removed these files and ran the commands again, but it finished with the same error.
Now the AA data repo is : ls $AA_DATA_REPO coverage.stats hg19 hg19_indexed.tar.gz hg19.tar.gz
ls $AA_DATA_REPO/hg19 annotations file_list.txt hg19full.fa.amb hg19full.fa.pac human_hg19_september_2011 cancer hg19_centromere.bed hg19full.fa.ann hg19full.fa.sa last_updated.txt conserved.bed hg19_cnvkit_filtered_ref.cnn hg19full.fa.bwt hg19_merged_centromeres_conserved_sorted.bed wgEncodeDukeMapabilityUniqueness35bp_sorted.bedGraph dummy_ploidy.vcf hg19full.fa
The files in output include: docker_home_manifest.log 048_D.clean.sorted.dupmark.merged_CNV_CALLS_pre_filtered.bed 048_outputs.tar.gz PAA_stdout.log 048_cnvkit_output 048_finish_flag.txt
赵爽 | |
---|---|
@. | ---- Replied Message ---- | From | Jens @.> | | Date | 1/10/2023 04:55 | | To | @.> | | Cc | @.> , State @.***> | | Subject | Re: [jluebeck/AmpliconSuite-pipeline] IndexError: list index out of range (Issue #29) |
Hi Shuang,
Thank you for sending the bed file. I was not able to reproduce this error locally. When running the same commands using the bed file on a testing bam file, it finished without error.
I believe the issue is probably because AA data repo is not configured properly. Could you please send the result of ls $AA_DATA_REPO ls $AA_DATA_REPO/hg19
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>
Thanks, the data repo appears to be correctly configured, but perhaps there is some issue with the contents of the data repo.
Did you delete the data repo itself and re-download? Can you check to make sure if the file wgEncodeDukeMapabilityUniqueness35bp_sorted.bedGraph is the same as what is provided in the data repo online? The latest version of the docker image will download the appropriate data repo for you - but you need to remove your local version first so that it does not re-use it. That may be a good way to test for the issue.
I think you have sent me the contents of PAA_stdout.log, but are you able to share the stdout printed to the terminal when running the run_paa_docker.py script?
Lastly, is your bam file coordinate sorted? Is it aligned to hg19 or is it aligned instead to GRCh37?
Thanks, Jens
Thank you very much! I re-download the data repo, and finished the pipeline successfully.
Best wishes, Shuang
Hi Jens,
When I run command:
python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref hg19 --bed /home/output//048_D.clean.sorted.dupmark.merged_CNV_CALLS_pre_filtered.bed --bam /home/bam_dir/048_D.clean.sorted.dupmark.merged.bam --gain 4.5 --cnsize_min 50000 --out /home/output/048_AA_CNV_SEEDS
I don't get the file: 048_AA_CNV_SEEDS and get the point: Traceback (most recent call last): File "/home/programs/AmpliconArchitect-master/src/amplified_intervals.py", line 176, in
if float(a.info[-1]) * a.segdup_uniqueness() > GAIN and a.rep_content() < 2.5:
File "/home/programs/AmpliconArchitect-master/src/ref_util.py", line 396, in rep_content
m = interval(duke35[p])
File "/home/programs/AmpliconArchitect-master/src/ref_util.py", line 149, in init
self.load_line(line, file_format, exclude_info_string=exclude_info_string)
File "/home/programs/AmpliconArchitect-master/src/ref_util.py", line 201, in load_line
self.start, self.end = sorted([int(float(ll[1])), int(float(ll[2]))])
IndexError: list index out of range
amplified_intervals.py returned a non-zero exit code. Exiting...
I don't how to solve this error. Any assistance would be appreciated Best, Shuang