Closed tetedange13 closed 3 years ago
My current hot fix is to return "UNsmoothed"(original) signal when a shape issue is detected at the end of smoothing:savgol()
:
bad_idx = (y > x.max()) | (y < x.min())
if bad_idx.any():
logging.warning("Smoothing overshot at {} / {} indices: "
"({}, {}) vs. original ({}, {})"
.format(bad_idx.sum(), len(bad_idx),
y.min(), y.max(),
x.min(), x.max()))
# >> START MODIF
if y[total_wing:-total_wing].shape != x.shape: # FE ADD:
print("[WARNING]: array shape mess up --> returnin UNsmoothed values")
return x
# << END MODIF
return y[total_wing:-total_wing]
Hi @tetedange13, thank you for this amazingly detailed bug report! I've investigated this and submitted a PR with what I think might be a permanent solution.
Hi,
First, thanks for this very useful tool and your investment developping it !
Informatic context: Running CNVkit v0.9.8 on CentOS 6 virtual machine, installed through: 1)
conda create -n my_env pip
2)conda activate my_env && pip install cnvkit
(+ conda installation of 'r-base' for DNAcopy R package)Data: Hybrid capture panel of genes and hotspots (capture probes coordinates used as manifest), somatic DNA (FFPE) with no normal sample I was previously using CNVkit v0.9.6 on the same panel but with an amplicon tech. Segmenting with HAAR used to give better results than CBS, so I wanted to give it a try on my new data and with an upgraded CNVkit. => I should precise that using v0.9.8 on my hybrid-capture data with a CBS-segmentation went fine (so no installation issues I guess ?)
Exact command and error:
cnvkit.py batch my_patient.bam -p 1 -n --segment-method haar -f hg19.fa -t capture_probes.manifest.bed --drop-low-coverage
Most patient when fine, except for a couple, that raised following error:(put complete traceback here for clarity)
Possible cause: This error happens during segmentation of chrY and can be reproduced:
cnvkit.py segment --drop-low-coverage -m haar my_patient.cnr
(I think I can even share with you both a buggy and a fonctionning ".cnr" with equivalent chrY depth, if needed) My panel has a single probe on SRY used as a control, which is "binned" by CNVkit into 1 antitarget and 1 target region within each .cnr as examplified here:smoothing.savgol()
via cnarr.smooth_log() and after "winging", we obtain a 4-sized signal_arr => It is here that something's wrong: Going throughsmoothing.convolve_weighted()
, the array shape is messed up (7 for both signal_arr and weights_arr, when we expect them to be 4-shaped), leading to this "index out of range" error described aboveheadPossible fixes: My understanding of these "smoothing things" is too limited to understand the precise origin of this bug, but here are some ideas:
smoothing.savgol()
, you have the following condition https://github.com/etal/cnvkit/blob/b218280edb266788ea8326596d4283183ce425ea/cnvlib/smoothing.py#L174 => Including case wherelen(x)==2
should not have a huge impact on segmentation ?Thanks in advance for your help. Best regards. Felix.