AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
56 stars 28 forks source link

ZeroDivisionError when finding amplified intervals #9

Closed auberginekenobi closed 4 years ago

auberginekenobi commented 4 years ago

Hi Jens, I'm getting a ZeroDivisionError while generating the [SAMPLE]_CNV_SEEDS.bed file from cnvkit results for a few of my samples. Could be a problem with the sample or the bam file but I wonder if you've seen it before.

Log: (stdout + stderr) (base) [ochapman@comet-ln3 AmpliconArchitect]$ cat [SAMPLE]/slurm-%x-33869064.out /home/ochapman/miniconda3/envs/cnvkit/lib/python3.7/site-packages/skgenome/intersect.py:15: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace. from pandas.core.index import Int64Index CNVkit 0.9.6 Wrote /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/GRCh37_cnvkit_filtered_ref.target-tmp.bed with 565907 regions Wrote /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/GRCh37_cnvkit_filtered_ref.antitarget-tmp.bed with 0 regions Running 1 samples in 16 processes (that's 16 processes per bam) Running the CNVkit pipeline on [SAMPLE]/[SAMPLE].bam ... Processing reads in [SAMPLE].bam Time: 365.348 seconds (3206112 reads/sec, 1549 bins/sec) Summary: #bins=565907, #reads=1171346894, mean=2069.8576, min=0.0, max=111391.54455445544 Percent reads in regions: 83.248 (of 1407057807 mapped) Wrote /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/[SAMPLE].targetcoverage.cnn with 565907 regions Skip processing ICGC_MB92.bam with empty regions file /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/GRCh37_cnvkit_filtered_ref.antitarget-tmp.bed Wrote /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/[SAMPLE].antitargetcoverage.cnn with 0 regions Processing target: [SAMPLE] Keeping 565907 of 565907 bins Correcting for GC bias... Processing antitarget: [SAMPLE] Wrote /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/[SAMPLE].cnr with 565907 regions Segmenting /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/[SAMPLE].cnr ... Segmenting with method 'cbs', significance threshold 1e-06, in 16 processes Dropped 1 outlier bins: chromosome start end gene depth log2 weight 0 1 234757235 234762234 - 160.511102 1.965995 0.942141 Dropped 1 / 20650 bins on chromosome 1 Dropped 1 outlier bins: chromosome start end gene depth log2 weight 0 2 230338821 230343821 - 247.3222 2.550093 0.942329 Dropped 1 / 29186 bins on chromosome 2 Dropped 1 outlier bins: chromosome start end gene depth log2 weight 0 13 67325231 67330231 - 226.8016 2.296797 0.942329 Dropped 1 / 13448 bins on chromosome 13 /home/ochapman/miniconda3/envs/cnvkit/lib/python3.7/site-packages/skgenome/intersect.py:15: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace. from pandas.core.index import Int64Index Segmenting with method 'cbs', significance threshold 0.0001, in 16 processes Dropped 1 outlier bins: chromosome start end gene log2 depth weight 0 1 234757235 234762234 - 1.966 160.511 0.942141 Dropped 1 / 20650 bins on chromosome 1 Dropped 1 outlier bins: chromosome start end gene log2 depth weight 0 2 230338821 230343821 - 2.55009 247.322 0.942329 Dropped 1 / 29186 bins on chromosome 2 Dropped 1 outlier bins: chromosome start end gene log2 depth weight 0 13 67325231 67330231 - 2.2968 226.802 0.942329 Dropped 1 / 13448 bins on chromosome 13 Dropped 3 outlier bins: chromosome start end gene log2 depth weight 0 Y 977526 982534 - -12.2974 0.003794 0.943837 1 Y 1269227 1274220 - -12.2999 0.004006 0.941010 2 Y 1339135 1344128 - -12.2267 0.004006 0.941010 Dropped 3 / 1971 bins on chromosome Y Wrote /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/[SAMPLE].cns with 476 regions GRCh37 Traceback (most recent call last): File "/oasis/projects/nsf/ddp360/collab/bin/AmpliconArchitect/AmpliconArchitect/src/amplified_intervals.py", line 91, in <module> rdList = hg.interval_list([r for r in rdList if float(r.info[1]) > GAIN + 2 * max(1.0, bamFileb2b.median_coverage(refi=r)[0] / bamFileb2b.median_coverage()[0]) - 2]) ZeroDivisionError: float division by zero Running PrepareAA on sample: [SAMPLE] /home/ochapman/miniconda3/envs/cnvkit/bin//python /home/ochapman/miniconda3/envs/cnvkit/bin//cnvkit.py batch -m wgs -y -r /oasis/projects/nsf/ddp360/collab/bin/AmpliconArchitect/data_repo/GRCh37/GRCh37_cnvkit_filtered_ref.cnn -p 16 -d /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/ /oasis/projects/nsf/ddp360/collab/data/icgc//[SAMPLE]/[SAMPLE].bam Set Rscript flag: --rscript-path /home/ochapman/miniconda3/envs/cnvkit/bin/Rscript /home/ochapman/miniconda3/envs/cnvkit/bin//python /home/ochapman/miniconda3/envs/cnvkit/bin//cnvkit.py segment /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/[SAMPLE].cnr --rscript-path /home/ochapman/miniconda3/envs/cnvkit/bin/Rscript -p 16 -o /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/[SAMPLE].cns Running amplified_intervals python /oasis/projects/nsf/ddp360/collab/bin/AmpliconArchitect/AmpliconArchitect/src/amplified_intervals.py --ref GRCh37 --bed /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/cnvkit_output/[SAMPLE]_CNV_GAIN.bed --bam /oasis/projects/nsf/ddp360/collab/data/icgc//[SAMPLE]/[SAMPLE].bam --gain 4.0 --cnsize_min 50000 --out /oasis/projects/nsf/ddp360/collab/projects/ecDNA/AmpliconArchitect/[SAMPLE]/[SAMPLE]_AA_CNV_SEEDS Completed

jluebeck commented 4 years ago

I believe we resolved this issue. Could you describe the issue/solution before I close this?

jluebeck commented 4 years ago

This error arose as a result of a collision in the name of two BAM files summarized in the coverage.stats file, located inside the user's AA_DATA_REPO folder. The fix is to remove the offending entry from the coverage.stats file, or clear the contents of the file entirely.