etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
548 stars 166 forks source link

[CNVkit v0.9.8] Errored smoothing with HAAR-segmentation ? #587

Closed tetedange13 closed 3 years ago

tetedange13 commented 3 years ago

Hi,
First, thanks for this very useful tool and your investment developping it !

Informatic context: Running CNVkit v0.9.8 on CentOS 6 virtual machine, installed through: 1) conda create -n my_env pip 2) conda activate my_env && pip install cnvkit (+ conda installation of 'r-base' for DNAcopy R package)

Data: Hybrid capture panel of genes and hotspots (capture probes coordinates used as manifest), somatic DNA (FFPE) with no normal sample I was previously using CNVkit v0.9.6 on the same panel but with an amplicon tech. Segmenting with HAAR used to give better results than CBS, so I wanted to give it a try on my new data and with an upgraded CNVkit. => I should precise that using v0.9.8 on my hybrid-capture data with a CBS-segmentation went fine (so no installation issues I guess ?)

Exact command and error: cnvkit.py batch my_patient.bam -p 1 -n --segment-method haar -f hg19.fa -t capture_probes.manifest.bed --drop-low-coverage Most patient when fine, except for a couple, that raised following error:

  File "/home/bioinfo/miniconda3/envs/cnvkit_repo/lib/python3.9/site-packages/cnvlib/segmentation/haar.py", line 328, in HaarConv
      highNonNormed += signal[highEnd] * weight[highEnd] - signal[k-1] * weight[k-1]
IndexError: index 2 is out of bounds for axis 0 with size 2

(put complete traceback here for clarity)

Possible cause: This error happens during segmentation of chrY and can be reproduced: cnvkit.py segment --drop-low-coverage -m haar my_patient.cnr (I think I can even share with you both a buggy and a fonctionning ".cnr" with equivalent chrY depth, if needed) My panel has a single probe on SRY used as a control, which is "binned" by CNVkit into 1 antitarget and 1 target region within each .cnr as examplified here:

Y  2507140  2654431  Antitarget                               0        -10.8612  0.0001
Y  2654931  2655203  NM_003140|SRY|3PUTR|1;NM_003140|SRY|1/1  1286.11  0.110645  0.932772


Possible fixes: My understanding of these "smoothing things" is too limited to understand the precise origin of this bug, but here are some ideas:

Thanks in advance for your help. Best regards. Felix.

tetedange13 commented 3 years ago

My current hot fix is to return "UNsmoothed"(original) signal when a shape issue is detected at the end of smoothing:savgol():

bad_idx = (y > x.max()) | (y < x.min())
    if bad_idx.any():
        logging.warning("Smoothing overshot at {} / {} indices: "
                        "({}, {}) vs. original ({}, {})"
                        .format(bad_idx.sum(), len(bad_idx),
                                y.min(), y.max(),
                                x.min(), x.max()))

# >> START MODIF
if y[total_wing:-total_wing].shape != x.shape: # FE ADD:
        print("[WARNING]: array shape mess up --> returnin UNsmoothed values")
        return x 
# << END MODIF

return y[total_wing:-total_wing]
tskir commented 3 years ago

Hi @tetedange13, thank you for this amazingly detailed bug report! I've investigated this and submitted a PR with what I think might be a permanent solution.