Closed tkoomar closed 7 years ago
The weight
column might be getting dropped when CNVkit segments the allele frequencies -- is there a way to disable that step in bcbio, i.e. not pass the VCF to the segment
command?
I added a unit test for segmentation with a VCF, and it passes on my machine under the current CNVkit and pandas versions 0.18.1 and 0.19 on Python 3. That suggests the error may depend on the environment, system configuration, or versions of installed dependencies. In particular, pandas v0.19 was released earlier this week and may have changed how DataFrame columns are added or filled during some operations. Does any of this sound plausible?
Yes, it seems more and more likely that the error is not in the cnvkit.py segment
command itself:
cnvkit.py segment
command on a sample which already completed the pipeline successfully, it produced the expected output with no error. This makes me wonder if the issue is created just a bit upstream of cnvkit.py segment
. sample51-sort.cnr
file produced by cnvkit.py fix
, though from a cursory examination it does not appear to be malformed (it has a weight column, as expected).sample51-effects-filter-sample51.vcf.gz
to ensure it is not somehow problematic. Could you try running the original CNVkit command by itself, outside of bcbio but using the same cnvkit.py? It was:
set -o pipefail
unset R_HOME && \
export PATH=/Dedicated/jmichaelson-wdata/bcbio/anaconda/bin:$PATH && \
/Dedicated/jmichaelson-wdata/bcbio/anaconda/bin/cnvkit.py segment \
/Dedicated/jmichaelson-sdata/SLI_WGS/batch10/fq/sample_51/structural/sample51/cnvkit/raw/sample51-sort.cnr \
-v /Dedicated/jmichaelson-sdata/SLI_WGS/batch10/fq/sample_51/freebayes/sample51-effects-filter-sample51.vcf.gz \
-o /Dedicated/jmichaelson-sdata/SLI_WGS/batch10/fq/sample_51/structural/sample51/cnvkit/raw/tx/tmpLVW5Ul/sample51-sort.cns \
--threshold 0.00001
This might emit some more warning messages to indicate what happened.
Here is a gist of output
Not a lot more detail than the bcbio debug log provides by itself, but hopefully there's something I'm just not picking up.
I think I've identified the problem and fixed it in the development version of CNVkit. Are you able to test that directly? If not, I'll roll another release soon so it can be included in bcbio.
For the time being, I have removed CNVkit from my bcbio pipeline, but I will try to get a standalone development version of CNVkit running to do a bit more testing.
I've released a new version of CNVkit with this putative fix. The conda build should be available in a few minutes or hours; care to update and try it out once it lands?
The conda build for CNVkit 0.8.1 on Linux should be available now.
Thanks Eric for the fix. It looks like 0.8.1 resolves the issues based on feedback in #1647 so I'll close and we can re-open if anyone runs into additional issues.
New error during creation of the sorted segemntation file, where CNVkit complains of
KeyError: 'weight'
. I initially though this might be somewhat similar to issue #1441, but after more poking around I do not believe it is related to coverage at all. Unfortunately, my relative lack of python experience is making it difficult to determine if the error is with CNVkit or bcbio.Traceback is below, gist of full debug log here.