The cn-filter of the call module crashes quite a number of my samples when processing chrY. I have not been able to figure out when exactly this happens. All I can say is that I have only observed it on chrY and it must have something to do with the coordinates, because if I play around with coordinates in the VCF file the problem vanishes. Specifically, moving the SNP into one of the segments solves the error, but I have seen situations where there were no SNPs outside segments and all segments had at least one SNP and yet the program crashed. Can you kindly look into this issue?
Thanks,
Sebastian
Here is a minimal artificial example to reproduce the error:
This is the command and the error:
$ cnvkit.py call test.cns -o test.cn_filtered.cns -v test.vcf -i tumor --filter cn --purity 0.570837228458
Selected test sample tumor
Using INFO DP
Loaded 1 records; skipped: 0 somatic, 0 depth
Kept 1 heterozygous of 1 VCF records
*WARNING* No X found in probes; check the input
Treating sample gender as male
Rescaling sample with purity 0.570837, ploidy 2
Calling copy number with thresholds: -1.1 => 0, -0.25 => 1, 0.2 => 2, 0.7 => 3
Applying filter 'cn'
Traceback (most recent call last):
File "/opt/cnvkit_virtualenv/bin/cnvkit.py", line 4, in <module>
__import__('pkg_resources').run_script('CNVkit==0.8.3.dev0', 'cnvkit.py')
File "/opt/cnvkit_virtualenv/lib/python2.7/site-packages/pkg_resources/__init__.py", line 743, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/opt/cnvkit_virtualenv/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1505, in run_script
exec(script_code, namespace, namespace)
File "/opt/cnvkit_virtualenv/lib/python2.7/site-packages/CNVkit-0.8.3.dev0-py2.7.egg/EGG-INFO/scripts/cnvkit.py", line 13, in <module>
File "build/bdist.linux-x86_64/egg/cnvlib/commands.py", line 848, in _cmd_call
File "build/bdist.linux-x86_64/egg/cnvlib/commands.py", line 912, in do_call
File "build/bdist.linux-x86_64/egg/cnvlib/segfilters.py", line 30, in wrapped_f
File "build/bdist.linux-x86_64/egg/cnvlib/segfilters.py", line 137, in cn
File "build/bdist.linux-x86_64/egg/cnvlib/segfilters.py", line 53, in squash_by_groups
File "/opt/cnvkit_virtualenv/lib/python2.7/site-packages/pandas/core/groupby.py", line 694, in apply
return self._python_apply_general(f)
File "/opt/cnvkit_virtualenv/lib/python2.7/site-packages/pandas/core/groupby.py", line 698, in _python_apply_general
self.axis)
File "/opt/cnvkit_virtualenv/lib/python2.7/site-packages/pandas/core/groupby.py", line 1601, in apply
res = f(group)
File "build/bdist.linux-x86_64/egg/cnvlib/segfilters.py", line 86, in squash_region
File "build/bdist.linux-x86_64/egg/cnvlib/descriptives.py", line 124, in weighted_median
IndexError: index 0 is out of bounds for axis 0 with size 0
Contents of test.cns:
chromosome start end gene log2 depth probes weight
Y 2655076 10104053 DUMMY1 0.0829199 31.8973 114 56.0337
Y 13105052 28657037 DUMMY2 0.0687064 33.4669 334 154.999
Contents of test.vcf:
##fileformat=VCFv4.1
##samtoolsVersion=0.1.19-44428cd
##contig=<ID=Y,length=59373566>
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Read Position Bias">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias (v2) for filtering splice-site artefacts in RNA-seq data. Note: this version may be broken.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="B-allele count">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor
Y 57185053 . G T 220 . DP=24;VDB=1.587579e-01;RPB=-1.039402e+00;AF1=0.5;AC1=1;DP4=6,3,9,5;MQ=41;FQ=148;PV4=1,0.3,0.38,1 GT:PL:GQ:AD 0/1:250,0,175:99:14
Dear Eric,
The cn-filter of the call module crashes quite a number of my samples when processing chrY. I have not been able to figure out when exactly this happens. All I can say is that I have only observed it on chrY and it must have something to do with the coordinates, because if I play around with coordinates in the VCF file the problem vanishes. Specifically, moving the SNP into one of the segments solves the error, but I have seen situations where there were no SNPs outside segments and all segments had at least one SNP and yet the program crashed. Can you kindly look into this issue?
Thanks, Sebastian
Here is a minimal artificial example to reproduce the error:
This is the command and the error:
Contents of test.cns:
Contents of test.vcf: