etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
540 stars 164 forks source link

segment hmm / hmm-tumor / hmm-germline method error #624

Closed TuanTrain closed 3 years ago

TuanTrain commented 3 years ago

When running cnvkit.py <fix_output.cnr> segment -m hmm-tumor, I get the following error:

Smoothing overshot at 7 / 426 indices: (-1.6565140618376288, 0.45213036504297766) vs. original (-1.39743, 0.771204)
Smoothing overshot at 9 / 470 indices: (-4.150087021981, 0.9813228012758921) vs. original (-4.85538, 0.090281)
Traceback (most recent call last):
  File "/home/<user>/.conda/envs/cnvkit2/bin/cnvkit.py", line 9, in <module>
    args.func(args)
  File "/home/<user>/.conda/envs/cnvkit2/lib/python3.7/site-packages/cnvlib/commands.py", line 664, in _cmd_segment
    smooth_cbs=args.smooth_cbs)
  File "/home/<user>/.conda/envs/cnvkit2/lib/python3.7/site-packages/cnvlib/segmentation/__init__.py", line 54, in do_segmentation
    rscript_path)
  File "/home/<user>/.conda/envs/cnvkit2/lib/python3.7/site-packages/cnvlib/segmentation/__init__.py", line 139, in _do_segmentation
    segarr = hmm.segment_hmm(filtered_cn, method, threshold, variants)
  File "/home/<user>/.conda/envs/cnvkit2/lib/python3.7/site-packages/cnvlib/segmentation/hmm.py", line 42, in segment_hmm
    cnarr['log2'] = cnarr.smooth_log2()  # window)
  File "/home/<user>/.conda/envs/cnvkit2/lib/python3.7/site-packages/skgenome/gary.py", line 173, in __setitem__
    self.data[index] = value
  File "/home/<user>/.conda/envs/cnvkit2/lib/python3.7/site-packages/pandas/core/frame.py", line 3163, in __setitem__
    self._set_item(key, value)
  File "/home/<user>/.conda/envs/cnvkit2/lib/python3.7/site-packages/pandas/core/frame.py", line 3242, in _set_item
    value = self._sanitize_column(key, value)
  File "/home/<user>/.conda/envs/cnvkit2/lib/python3.7/site-packages/pandas/core/frame.py", line 3899, in _sanitize_column
    value = sanitize_index(value, self.index)
  File "/home/<user>/.conda/envs/cnvkit2/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 752, in sanitize_index
    "Length of values "
ValueError: Length of values (12883) does not match length of index (12880)

I am running on conda, with versions: cnvkit 0.9.8 python 3.7 panda 1.2.4

This exact error also occurs when running with the flags -m hmm, -m hmm-germline, so seems to be hmm-related. A different error occurs when running -m haar (can post if needed), and there are no issues with -m none or -m cbs.

I am using the same cnvkit pipeline for other analyses, and they don't have this issue. 2 of 4 datasets have this issue, and the other 2 do not.

Interestingly, when downgrading to cnvkit version 0.9.6, there is no error for the exact same command that gives this error in version 0.9.8.

Any ideas as what could be the issue or how to resolve it?

My first workaround was to use version 0.9.8 for all the steps except for the segment step, where I would use 0.9.6 instead. However, when testing with a different data set, I found that the outputs are different for the same exact input for the two versions, so I assume some algorithms implementations or defaults changed, which makes this approach not ideal for reproducibility. (Specifically, I found reference, fix, and segment to have different outputs for the same input for the two versions of cnvkit).

Appreciate the help!

tetedange13 commented 3 years ago

Hi @TuanTrain , thanks for your precise issue reporting

Not an author of CNVkit, but this could be related to issue #587 (were fixed since) => Could you please update to latest version of CNVkit (v0.9.9), then tell us if you still got the same error?

Hope this helps. Have a nice day. Felix.

TuanTrain commented 3 years ago

Hi Felix, thanks for the recommendation. I updated to the latest version 0.9.9 and the command now works! Closing the issue.