etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
540 stars 165 forks source link

CnvKit - RuntimeError: Improper post-processing of segment Pandas(chromosome='chr4', start=53319, end=100518111, gene='-', log2=0.0351520811713932, probes=9335) -- 1 bins start >= end #513

Closed chatchawit closed 3 years ago

chatchawit commented 4 years ago

bcbio: 1.2.0 os: Ubuntu 18.04.4 LTS (Bionic Beaver) yaml & logs: bcbio.zip

I've updated CnvKit to version 0.9.7.b1. I reran the command the causes the error. Here is the error message.

bcbio@bcbio-virtual-machine:~/nute/tmp$ /home/bcbio/install/stable/anaconda/bin/cnvkit.py segment -p 8 -o LU147-11-sort-LU147-11-germline.cns /home/bcbio/nute/work1/structural/LU147-11/cnvkit/raw/LU147-11-sort-LU147-11-germline.cnr --vcf /home/bcbio/nute/work1/gatk-haplotype/LU147-11-germline-effects-annotated-nomissingalt-filterSNP-filterINDEL.vcf.gz --sample-id LU147-11 /home/bcbio/cnvkit/skgenome/intersect.py:11: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace. from pandas.core.index import Int64Index Selected test sample LU147-11 Loaded 92385 records; skipped: 0 somatic, 6761 depth Kept 56492 heterozygous of 92385 VCF records Segmenting with method 'cbs', significance threshold 0.0001, in 8 processes Smoothing overshot at 2 / 9385 indices: (-27.72153416745338, 3.414831611982385) vs. original (-26.471999999999998, 5.435) Re-segmenting on variant allele frequency Done, now finalizing Re-segmenting on variant allele frequency Re-segmenting on variant allele frequency Done, now finalizing Done, now finalizing Re-segmenting on variant allele frequency Re-segmenting on variant allele frequency Segment chr1:142352977-152153906 on allele freqs for 13 additional breakpoints Done, now finalizing Done, now finalizing Segment chr1:152357425-155210401 on allele freqs for 5 additional breakpoints Re-segmenting on variant allele frequency Segment chr2:38821-29052144 on allele freqs for 23 additional breakpoints Done, now finalizing Done, now finalizing Re-segmenting on variant allele frequency Done, now finalizing Done, now finalizing Segment chr1:155942703-161310049 on allele freqs for 23 additional breakpoints Done, now finalizing Done, now finalizing Re-segmenting on variant allele frequency Segment chr1:161675242-175127142 on allele freqs for 16 additional breakpoints Done, now finalizing Done, now finalizing Segment chr1:175127977-184628662 on allele freqs for 9 additional breakpoints Done, now finalizing Segment chr3:500-44660689 on allele freqs for 45 additional breakpoints Done, now finalizing Segment chr1:184628662-200891643 on allele freqs for 7 additional breakpoints Done, now finalizing Segment chr3:44660694-46901253 on allele freqs for 5 additional breakpoints Segment chr4:101023406-190060385 on allele freqs for 84 additional breakpoints Done, now finalizing Segment chr2:29058935-73448174 on allele freqs for 58 additional breakpoints Done, now finalizing Segment chr1:12080-125528213 on allele freqs for 198 additional breakpoints Segment chr1:201211646-205449979 on allele freqs for 9 additional breakpoints Segment chr3:46901308-64516246 on allele freqs for 7 additional breakpoints Done, now finalizing Done, now finalizing Segment chr3:64516246-96667741 on allele freqs for 7 additional breakpoints Segment chr3:96814781-126161074 on allele freqs for 42 additional breakpoints Done, now finalizing Segment chr2:73452094-86984871 on allele freqs for 17 additional breakpoints Segment chr1:205456116-226602044 on allele freqs for 22 additional breakpoints Done, now finalizing Segment chr1:226602071-228182247 on allele freqs for 3 additional breakpoints Done, now finalizing Segment chr3:126179697-134603596 on allele freqs for 26 additional breakpoints Done, now finalizing Segment chr4:53319-100518111 on allele freqs for 120 additional breakpoints Done, now finalizing Segment chr1:228203499-247948961 on allele freqs for 30 additional breakpoints Done, now finalizing Segment chr1:247949130-248574814 on allele freqs for 5 additional breakpoints Segment chr2:90234814-128868229 on allele freqs for 38 additional breakpoints Segment chr3:134603596-195584145 on allele freqs for 65 additional breakpoints Done, now finalizing Segment chr3:195618640-195729736 on allele freqs for 1 additional breakpoints Done, now finalizing Segment chr2:129922841-242175633 on allele freqs for 178 additional breakpoints Segment chr3:195788628-198295059 on allele freqs for 2 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Segment chr5:500-34182093 on allele freqs for 16 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Re-segmenting on variant allele frequency Re-segmenting on variant allele frequency Done, now finalizing Segment chr5:34194113-47576677 on allele freqs for 14 additional breakpoints Done, now finalizing Done, now finalizing Segment chr6:129049911-160611813 on allele freqs for 38 additional breakpoints Done, now finalizing Re-segmenting on variant allele frequency Segment chr6:160650249-170745979 on allele freqs for 13 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Segment chr7:13744-32621106 on allele freqs for 47 additional breakpoints Done, now finalizing Segment chr8:500-6938388 on allele freqs for 6 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Done, now finalizing Segment chr6:500-26017207 on allele freqs for 46 additional breakpoints Done, now finalizing Re-segmenting on variant allele frequency Segment chr6:26017207-28080491 on allele freqs for 4 additional breakpoints Done, now finalizing Done, now finalizing Segment chr6:28080991-31010784 on allele freqs for 9 additional breakpoints Done, now finalizing Segment chr8:65527362-85642999 on allele freqs for 15 additional breakpoints Segment chr6:31032094-31815085 on allele freqs for 3 additional breakpoints Segment chr5:49456651-181537759 on allele freqs for 169 additional breakpoints Done, now finalizing Done, now finalizing Re-segmenting on variant allele frequency Segment chr7:32732948-100949821 on allele freqs for 63 additional breakpoints Done, now finalizing Segment chr8:8017929-64798544 on allele freqs for 76 additional breakpoints Done, now finalizing Segment chr7:100951771-100994699 on allele freqs for 11 additional breakpoints Segment chr6:32097630-32589854 on allele freqs for 9 additional breakpoints Done, now finalizing Segment chr7:101032065-101041885 on allele freqs for 1 additional breakpoints Done, now finalizing Done, now finalizing Segment chr8:88039768-124556157 on allele freqs for 51 additional breakpoints Segment chr9:12193-46085995 on allele freqs for 48 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Segment chr6:32661922-44155462 on allele freqs for 38 additional breakpoints Done, now finalizing Segment chr7:101047816-143621097 on allele freqs for 47 additional breakpoints Done, now finalizing Done, now finalizing Segment chr6:44158701-52497904 on allele freqs for 8 additional breakpoints Done, now finalizing Re-segmenting on variant allele frequency Segment chr8:124556159-143607664 on allele freqs for 29 additional breakpoints Done, now finalizing Done, now finalizing Segment chr8:143607828-145138136 on allele freqs for 2 additional breakpoints Done, now finalizing Segment chr7:144379930-159345473 on allele freqs for 28 additional breakpoints Segment chr9:68541476-81914459 on allele freqs for 12 additional breakpoints Done, now finalizing Segment chr6:52503150-128883462 on allele freqs for 60 additional breakpoints Segment chr10:47062-64825825 on allele freqs for 78 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Segment chr10:65919968-70052282 on allele freqs for 3 additional breakpoints Done, now finalizing Segment chr10:70052782-72528823 on allele freqs for 5 additional breakpoints Segment chr9:82997024-113164321 on allele freqs for 55 additional breakpoints Done, now finalizing Done, now finalizing Re-segmenting on variant allele frequency Re-segmenting on variant allele frequency Done, now finalizing Segment chr9:113164321-121192327 on allele freqs for 15 additional breakpoints Segment chr10:72532934-96956507 on allele freqs for 22 additional breakpoints Done, now finalizing Done, now finalizing Segment chr11:99325336-117287240 on allele freqs for 16 additional breakpoints Done, now finalizing Done, now finalizing Segment chr11:117287240-118166276 on allele freqs for 6 additional breakpoints Segment chr10:96981250-117197724 on allele freqs for 25 additional breakpoints Done, now finalizing Done, now finalizing Segment chr10:118686435-122580685 on allele freqs for 3 additional breakpoints Done, now finalizing Segment chr11:118166276-123754554 on allele freqs for 12 additional breakpoints Re-segmenting on variant allele frequency Segment chr10:122601796-128103021 on allele freqs for 11 additional breakpoints Done, now finalizing Done, now finalizing Segment chr9:122997166-138394217 on allele freqs for 48 additional breakpoints Done, now finalizing Segment chr11:123805374-124776225 on allele freqs for 3 additional breakpoints Segment chr10:128109062-133659585 on allele freqs for 5 additional breakpoints Done, now finalizing Segment chr11:124801199-135086122 on allele freqs for 11 additional breakpoints Segment chr13:15142502-57141366 on allele freqs for 41 additional breakpoints Re-segmenting on variant allele frequency Done, now finalizing Segment chr12:12211-9443585 on allele freqs for 26 additional breakpoints Segment chr11:500-99021328 on allele freqs for 203 additional breakpoints Done, now finalizing Segment chr12:9594487-133235380 on allele freqs for 262 additional breakpoints concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker r = call_item.fn(*call_item.args, *call_item.kwargs) File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/process.py", line 153, in _process_chunk return [fn(args) for args in chunk] File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/process.py", line 153, in return [fn(args) for args in chunk] File "/home/bcbio/cnvkit/cnvlib/segmentation/init.py", line 89, in _ds return _do_segmentation(args) File "/home/bcbio/cnvkit/cnvlib/segmentation/init.py", line 182, in _do_segmentation for segment, subvarr in variants.by_ranges(segarr)] File "/home/bcbio/cnvkit/cnvlib/segmentation/init.py", line 182, in for segment, subvarr in variants.by_ranges(segarr)] File "/home/bcbio/cnvkit/cnvlib/segmentation/hmm.py", line 231, in variants_in_segment dframe[bad_segs_idx])) RuntimeError: Improper post-processing of segment Pandas(chromosome='chr4', start=53319, end=100518111, gene='-', log2=0.0351520811713932, probes=9335) -- 1 bins start >= end: chromosome start end gene log2 probes 63 chr4 56394864 56394864 - 0.035152 1

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/bcbio/install/stable/anaconda/bin/cnvkit.py", line 9, in args.func(args) File "/home/bcbio/cnvkit/cnvlib/commands.py", line 660, in _cmd_segment smooth_cbs=args.smooth_cbs) File "/home/bcbio/cnvkit/cnvlib/segmentation/init.py", line 64, in dosegmentation for , ca in cnarr.by_arm()))) File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/process.py", line 366, in _chain_from_iterable_of_lists for element in iterable: File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator yield fs.pop().result() File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/_base.py", line 425, in result return self.get_result() File "/home/bcbio/install/stable/anaconda/lib/python3.6/concurrent/futures/_base.py", line 384, in get_result raise self._exception RuntimeError: Improper post-processing of segment Pandas(chromosome='chr4', start=53319, end=100518111, gene='-', log2=0.0351520811713932, probes=9335) -- 1 bins start >= end: chromosome start end gene log2 probes 63 chr4 56394864 56394864 - 0.035152 1

naumenko-sa commented 4 years ago

Hello, Eric @etal! +1, we see a similar issue in the other project.

SN

chatchawit commented 4 years ago

Dear @etal,

I have a job to run cnvkit. Do you have any suggestion? I'm going to old versions to avoid this error. I found the error in v0.9.7b1 and 0.9.6.

Trying 0.9.5 -> 0.9.4 -> 0.9.3 -> ...

Best regards, Chat

etal commented 4 years ago

Maybe the behavior of numpy or pandas indexing changed. Does the same error appear when CNVkit is run from one of the pre-built Docker images? If it does, then pinning numpy or pandas to an earlier version might fix it for everyone.

In particular, downgrading pandas to an earlier version like 0.25 might be a quick fix, if the pandas 1.0 was the source of a breaking change.

naumenko-sa commented 4 years ago

Hi Eric @etal !

Some additional info: for us v0.9.7b1 runs ok with some samples and fails with others, so it does not look like a systematic pandas error (would not it crash for every sample then?),
rather it hits some edge cases.

The particular error seems to stem from here: https://github.com/etal/cnvkit/blob/master/cnvlib/segmentation/hmm.py#L227

The segmentation command:

cnvkit.py segment \
-p 1 \
-o  sample-germline.cns \
sample-germline.cnr \
--vcf sample-germline-effects-annotated-nomissingalt-filterSNP-filterINDEL.vcf.gz \
--sample-id sample

There is a warning in the beginning of segmentation

cnvkit/skgenome/intersect.py:11: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import Int64Index

but it just means that import should be simpler in the future, rather than a different indexing logic:

from pandas import Int64Index

the traceback

Selected test sample SAMPLE
Loaded 69154 records; skipped: 0 somatic, 2079 depth
Kept 47403 heterozygous of 69154 VCF records
Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes
Re-segmenting on variant allele frequency
Done, now finalizing
Segment chr1:925941-1439315 on allele freqs for 2 additional breakpoints
Done, now finalizing
Segment chr1:1734689-12138436 on allele freqs for 11 additional breakpoints
Done, now finalizing
Segment chr1:12142286-12920132 on allele freqs for 4 additional breakpoints
Done, now finalizing
Segment chr1:15733796-16458996 on allele freqs for 5 additional breakpoints
Done, now finalizing
Segment chr1:16889215-58414892 on allele freqs for 75 additional breakpoints
Done, now finalizing
Segment chr1:58576184-62275087 on allele freqs for 2 additional breakpoints
Done, now finalizing
Segment chr1:74571353-77947552 on allele freqs for 5 additional breakpoints
Done, now finalizing
Segment chr1:88949090-91266107 on allele freqs for 4 additional breakpoints
Done, now finalizing
Segment chr1:105957196-109508912 on allele freqs for 6 additional breakpoints
Done, now finalizing
Segment chr1:109620268-117870068 on allele freqs for 4 additional breakpoints
Re-segmenting on variant allele frequency
Done, now finalizing
Segment chr1:142736076-248918363 on allele freqs for 166 additional breakpoints
Traceback (most recent call last):
  File "/bcbio/anaconda/bin/cnvkit.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/root/cnvkit/cnvkit.py", line 9, in <module>
    args.func(args)
  File "/root/cnvkit/cnvlib/commands.py", line 660, in _cmd_segment
    smooth_cbs=args.smooth_cbs)
  File "/root/cnvkit/cnvlib/segmentation/__init__.py", line 64, in do_segmentation
    for _, ca in cnarr.by_arm())))
  File "/root/cnvkit/cnvlib/segmentation/__init__.py", line 89, in _ds
    return _do_segmentation(*args)
  File "/root/cnvkit/cnvlib/segmentation/__init__.py", line 182, in _do_segmentation
    for segment, subvarr in variants.by_ranges(segarr)]
  File "/root/cnvkit/cnvlib/segmentation/__init__.py", line 182, in <listcomp>
    for segment, subvarr in variants.by_ranges(segarr)]
  File "/root/cnvkit/cnvlib/segmentation/hmm.py", line 231, in variants_in_segment
    dframe[bad_segs_idx]))
RuntimeError: Improper post-processing of segment Pandas(chromosome='chr1', start=142736076, end=248918363, gene='-', log2=0.00443043212849374, probes=11127) -- 1 bins start >= end:
    chromosome      start        end gene     log2  probes
146       chr1  231366498  231366498    -  0.00443       1

start == end for the segment = 231366498

same for @chatchawit, start == end for the segment.

In the locus, cnr file has

chr1    142,736,076 143,729,043 Antitarget  4.791   2.77    0.0001

Let us know if we can help more to chase this bug!

Sergey

chatchawit commented 4 years ago

@naumenko-sa @etal I go back to cnvkit 0.9.5 and run with a single sample (tumor only) using BCBIO. It works. I am running a tumor/normal pair. I will let you know.

naumenko-sa commented 4 years ago

I downgraded cnvkit to 0.9.5 and it helped to finish a tumor-normal project. we pinned it to 0.9.5 in bcbio, as 0.9.6 / 0.9.7b are giving us issues. SN

naumenko-sa commented 4 years ago

upd: 0.9.5 giving us issues as well with some samples.

etal commented 4 years ago

It looks like this is a bug in the HMM segmentation method, which has changed at least once during the last few releases (hmmlearn -> pomegranate). Even with CBS as the copy number segmentation method, HMM is used for allele frequencies when given a VCF input, apparently.

mmterpstra commented 3 years ago

pandas-0.24.2/CNVkit-0.9.7/Python-3.7.2/pomegranate-0.13.3 combination also causes this issue on some datasets. The input data looks like the pandas datatype uses bed like formatting whilst internally cnvkit is using gff/interval_list like coordinates. Reading bcbio/bcbio-nextgen#3416 will consider PureCN. I'm not using CNVKIT as part of an bcbio pipeline btw

seedgeorge commented 2 years ago

I've run into this issue on some deep WGS data - was it resolved? I tested using v0.9.9 with the following command. cnvkit.py segment -p 8 sample.cnr -v sample.genotyped.vcf.gz -i samplename -o sample.baf.cns

xsvato01 commented 6 months ago

I got the same problem, segmentation works without supplying .vcf file with the -v parameter though.

GACGAMA commented 2 months ago

Has this been solved? I'm getting the same error for some whole-exomes on the newest version