bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

CnvKit - 0.9.7b issues #3201

Closed chatchawit closed 4 years ago

chatchawit commented 4 years ago

This issue continues from the previous one. Please see https://github.com/bcbio/bcbio-nextgen/issues/3194.

bcbio: 1.2.0 os: Ubuntu 18.04.4 LTS (Bionic Beaver) yaml & logs: bcbio.zip

I've updated CnvKit to version 0.9.7.b1. I reran the command the causes the error. Here is the error message.

bcbio@bcbio-virtual-machine:~/nute/tmp$ /home/bcbio/install/stable/anaconda/bin/cnvkit.py segment -p 8 -o LU147-11-sort-LU147-11-germline.cns /home/bcbio/nute/work1/structural/LU147-11/cnvkit/raw/LU147-11-sort-LU147-11-germline.cnr --vcf /home/bcbio/nute/work1/gatk-haplotype/LU147-11-germline-effects-annotated-nomissingalt-filterSNP-filterINDEL.vcf.gz --sample-id LU147-11 /home/bcbio/cnvkit/skgenome/intersect.py:11: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace. from pandas.core.index import Int64Index Selected test sample LU147-11 Loaded 92385 records; skipped: 0 somatic, 6761 depth Kept 56492 heterozygous of 92385 VCF records Segmenting with method 'cbs', significance threshold 0.0001, in 8 processes Smoothing overshot at 2 / 9385 indices: (-27.72153416745338, 3.414831611982385) vs. original (-26.471999999999998, 5.435) Re-segmenting on variant allele frequency Done, now finalizing Re-segmenting on variant allele frequency Re-segmenting on variant allele frequency Done, now finalizing Done, now finalizing Re-segmenting on variant allele frequency Re-segmenting on variant allele frequency Segment chr1:142352977-152153906 on allele freqs for 13 additional breakpoints Done, now finalizing Done, now finalizing Segment chr1:152357425-155210401 on allele freqs for 5 additional breakpoints Re-segmenting on variant allele frequency Segment chr2:38821-29052144 on allele freqs for 23 additional breakpoints Done, now finalizing Done, now finalizing Re-segmenting on variant allele frequency Done, now finalizing Done, now finalizing Segment chr1:155942703-161310049 on allele freqs for 23 additional breakpoints Done, now finalizing Done, now finalizing Re-segmenting on variant allele frequency Segment chr1:161675242-175127142 on allele freqs for 16 additional breakpoints Done, now finalizing Done, now finalizing Segment chr1:175127977-184628662 on allele freqs for 9 additional breakpoints Done, now finalizing Segment chr3:500-44660689 on allele freqs for 45 additional breakpoints Done, now finalizing Segment chr1:184628662-200891643 on allele freqs for 7 additional breakpoints Done, now finalizing Segment chr3:44660694-46901253 on allele freqs for 5 additional breakpoints Segment chr4:101023406-190060385 on allele freqs for 84 additional breakpoints Done, now finalizing Segment chr2:29058935-73448174 on allele freqs for 58 additional breakpoints Done, now finalizing Segment chr1:12080-125528213 on allele freqs for 198 additional breakpoints Segment chr1:201211646-205449979 on allele freqs for 9 additional breakpoints Segment chr3:46901308-64516246 on allele freqs for 7 additional breakpoints Done, now finalizing Done, now finalizing Segment chr3:64516246-96667741 on allele freqs for 7 additional breakpoints Segment chr3:96814781-126161074 on allele freqs for 42 additional breakpoints Done, now finalizing Segment chr2:73452094-86984871 on allele freqs for 17 additional breakpoints Segment chr1:205456116-226602044 on allele freqs for 22 additional breakpoints Done, now finalizing Segment chr1:226602071-228182247 on allele freqs for 3 additional breakpoints Done, now finalizing Segment chr3:126179697-134603596 on allele freqs for 26 additional breakpoints Done, now finalizing Segment chr4:53319-100518111 on allele freqs for 120 additional breakpoints Done, now finalizing Segment chr1:228203499-247948961 on allele freqs for 30 additional breakpoints Done, now finalizing Segment chr1:247949130-248574814 on allele freqs for 5 additional breakpoints Segment chr2:90234814-128868229 on allele freqs for 38 additional breakpoints Segment chr3:134603596-195584145 on allele freqs for 65 additional breakpoints Done, now finalizing Segment chr3:195618640-195729736 on allele freqs for 1 additional breakpoints Done, now finalizing Segment chr2:129922841-242175633 on allele freqs for 178 additional breakpoints Segment chr3:195788628-198295059 on allele freqs for 2 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Segment chr5:500-34182093 on allele freqs for 16 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Re-segmenting on variant allele frequency Re-segmenting on variant allele frequency Done, now finalizing Segment chr5:34194113-47576677 on allele freqs for 14 additional breakpoints Done, now finalizing Done, now finalizing Segment chr6:129049911-160611813 on allele freqs for 38 additional breakpoints Done, now finalizing Re-segmenting on variant allele frequency Segment chr6:160650249-170745979 on allele freqs for 13 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Segment chr7:13744-32621106 on allele freqs for 47 additional breakpoints Done, now finalizing Segment chr8:500-6938388 on allele freqs for 6 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Done, now finalizing Segment chr6:500-26017207 on allele freqs for 46 additional breakpoints Done, now finalizing Re-segmenting on variant allele frequency Segment chr6:26017207-28080491 on allele freqs for 4 additional breakpoints Done, now finalizing Done, now finalizing Segment chr6:28080991-31010784 on allele freqs for 9 additional breakpoints Done, now finalizing Segment chr8:65527362-85642999 on allele freqs for 15 additional breakpoints Segment chr6:31032094-31815085 on allele freqs for 3 additional breakpoints Segment chr5:49456651-181537759 on allele freqs for 169 additional breakpoints Done, now finalizing Done, now finalizing Re-segmenting on variant allele frequency Segment chr7:32732948-100949821 on allele freqs for 63 additional breakpoints Done, now finalizing Segment chr8:8017929-64798544 on allele freqs for 76 additional breakpoints Done, now finalizing Segment chr7:100951771-100994699 on allele freqs for 11 additional breakpoints Segment chr6:32097630-32589854 on allele freqs for 9 additional breakpoints Done, now finalizing Segment chr7:101032065-101041885 on allele freqs for 1 additional breakpoints Done, now finalizing Done, now finalizing Segment chr8:88039768-124556157 on allele freqs for 51 additional breakpoints Segment chr9:12193-46085995 on allele freqs for 48 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Segment chr6:32661922-44155462 on allele freqs for 38 additional breakpoints Done, now finalizing Segment chr7:101047816-143621097 on allele freqs for 47 additional breakpoints Done, now finalizing Done, now finalizing Segment chr6:44158701-52497904 on allele freqs for 8 additional breakpoints Done, now finalizing Re-segmenting on variant allele frequency Segment chr8:124556159-143607664 on allele freqs for 29 additional breakpoints Done, now finalizing Done, now finalizing Segment chr8:143607828-145138136 on allele freqs for 2 additional breakpoints Done, now finalizing Segment chr7:144379930-159345473 on allele freqs for 28 additional breakpoints Segment chr9:68541476-81914459 on allele freqs for 12 additional breakpoints Done, now finalizing Segment chr6:52503150-128883462 on allele freqs for 60 additional breakpoints Segment chr10:47062-64825825 on allele freqs for 78 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Segment chr10:65919968-70052282 on allele freqs for 3 additional breakpoints Done, now finalizing Segment chr10:70052782-72528823 on allele freqs for 5 additional breakpoints Segment chr9:82997024-113164321 on allele freqs for 55 additional breakpoints Done, now finalizing Done, now finalizing Re-segmenting on variant allele frequency Re-segmenting on variant allele frequency Done, now finalizing Segment chr9:113164321-121192327 on allele freqs for 15 additional breakpoints Segment chr10:72532934-96956507 on allele freqs for 22 additional breakpoints Done, now finalizing Done, now finalizing Segment chr11:99325336-117287240 on allele freqs for 16 additional breakpoints Done, now finalizing Done, now finalizing Segment chr11:117287240-118166276 on allele freqs for 6 additional breakpoints Segment chr10:96981250-117197724 on allele freqs for 25 additional breakpoints Done, now finalizing Done, now finalizing Segment chr10:118686435-122580685 on allele freqs for 3 additional breakpoints Done, now finalizing Segment chr11:118166276-123754554 on allele freqs for 12 additional breakpoints Re-segmenting on variant allele frequency Segment chr10:122601796-128103021 on allele freqs for 11 additional breakpoints Done, now finalizing Done, now finalizing Segment chr9:122997166-138394217 on allele freqs for 48 additional breakpoints Done, now finalizing Segment chr11:123805374-124776225 on allele freqs for 3 additional breakpoints Segment chr10:128109062-133659585 on allele freqs for 5 additional breakpoints Done, now finalizing Segment chr11:124801199-135086122 on allele freqs for 11 additional breakpoints Segment chr13:15142502-57141366 on allele freqs for 41 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Segment chr12:12211-9443585 on allele freqs for 26 additional breakpoints Segment chr11:500-99021328 on allele freqs for 203 additional breakpoints Done, now finalizing Segment chr12:9594487-133235380 on allele freqs for 262 additional breakpoints concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker r = call_item.fn(*call_item.args, *call_item.kwargs) File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/process.py", line 153, in _process_chunk return [fn(args) for args in chunk] File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/process.py", line 153, in return [fn(args) for args in chunk] File "/home/bcbio/cnvkit/cnvlib/segmentation/init.py", line 89, in _ds return _do_segmentation(args) File "/home/bcbio/cnvkit/cnvlib/segmentation/init.py", line 182, in _do_segmentation for segment, subvarr in variants.by_ranges(segarr)] File "/home/bcbio/cnvkit/cnvlib/segmentation/init.py", line 182, in for segment, subvarr in variants.by_ranges(segarr)] File "/home/bcbio/cnvkit/cnvlib/segmentation/hmm.py", line 231, in variants_in_segment dframe[bad_segs_idx])) RuntimeError: Improper post-processing of segment Pandas(chromosome='chr4', start=53319, end=100518111, gene='-', log2=0.0351520811713932, probes=9335) -- 1 bins start >= end: chromosome start end gene log2 probes 63 chr4 56394864 56394864 - 0.035152 1

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/bcbio/install/stable/anaconda/bin/cnvkit.py", line 9, in args.func(args) File "/home/bcbio/cnvkit/cnvlib/commands.py", line 660, in _cmd_segment smooth_cbs=args.smooth_cbs) File "/home/bcbio/cnvkit/cnvlib/segmentation/init.py", line 64, in dosegmentation for , ca in cnarr.by_arm()))) File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/process.py", line 366, in _chain_from_iterable_of_lists for element in iterable: File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator yield fs.pop().result() File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/_base.py", line 425, in result return self.get_result() File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/_base.py", line 384, in get_result raise self._exception RuntimeError: Improper post-processing of segment Pandas(chromosome='chr4', start=53319, end=100518111, gene='-', log2=0.0351520811713932, probes=9335) -- 1 bins start >= end: chromosome start end gene log2 probes 63 chr4 56394864 56394864 - 0.035152 1

naumenko-sa commented 4 years ago

Hi @chatchawit ! Thanks for posting. We also see this issue on our data. Tracking here: https://github.com/etal/cnvkit/issues/513 SN

naumenko-sa commented 4 years ago

downgrading to cnvkit0.9.5 helped.

which conda
conda install cnvkit=0.9.5 -c bioconda -c conda-forge
which cnvkit.py
cnvkit.py version

Overall, we started experiencing issues with cnvkit0.9.6. 0.9.7b fixed some of them but not all.

I pinned cnvkit=0.9.5 in cloudbiolinux: https://github.com/chapmanb/cloudbiolinux/blob/master/contrib/flavor/ngs_pipeline_minimal/packages-conda.yaml#L43

naumenko-sa commented 4 years ago

cnvkit 0.9.5 fails with some other sample. Reverting the pinning.

chatchawit commented 4 years ago

Hi, I'm going to tell you that I succesfully complete the somatic mutation call (tumor vs. normal) using bcbio and cnvkit 0.9.7.b1. The tumor-only somatic call is nearly complete without an error.

Last time when the bcbio terminated, I reran the last command (cnvkit) and still found the same error. So I hypothesized that cnvkit was the cause and tried other versions of cnvkit and it's ok. However, when I reran the whole workflow again, it's errors. So I switched to the latest version of cnvkit and modified the configuration (yaml file). Now the whole workflow runs successfully.

I'm sorry about my previous report. The error might be due to the config (yaml). As somebody mentioned before, it's difficult for a user to separate the cause of an error: the config or an individual program.

Here the config that works.

Template for paired (tumor/normal) variant calling


details:

This config is for tumor-only.

Template for paired (tumor only) variant calling


details:

naumenko-sa commented 4 years ago

What was different in the config that passed? I doubt it was the cause, as we are seeing similar cnvkit issues for the same config for different samples.

chatchawit commented 4 years ago

I have changed the config several times but the last time I only increased the memory.

Old config

default: cores: 8 memory: 4G jvm_opts: ["-Xms750m", "-Xmx4000m"]

New config

default: cores: 8 memory: 4G jvm_opts: ["-Xms1g", "-Xmx16g"]

Another thing that confused me is the location of "ensemble" line. Can it be placed anywhere or at a specific location? Indents are necessary or not? Either tabs and spaces is ok?

Sometimes I commented the variant call (both somatic + germline) and ran only svcaller (only cnvkit) to try a different version of cnvkit. I did not comment the ensemble and numpass was set at 3.

naumenko-sa commented 4 years ago

Thanks for reporting, still it is hard to connect dots for me here.

We saw a bug in CNVkit logic, probably related to the transition to new pandas. Here you are saying that increasing JVM memory helped to solve that issue. But cnvkit does not use java.

Re yaml: https://en.wikipedia.org/wiki/YAML#Indented_delimiting ensemble goes under algorithm:

details:
- algorithm:
    ensemble:
      numpass: 2
chatchawit commented 4 years ago

@naumenko-sa Thank you for your reply. BCBIO is a great tool, although there's a learning curve. I've tried many things to solve the problem. However, I fixed all input files, modified YAML config, upgraded BCBIO, re-installed cnvkit and switched between recent versions. Probably, JVM memory does not involve. Today, a tumor-only run (1 sample) has completed without an error.