Open JakeHagen opened 5 months ago
Hello Jake, thanks for pointing this out. Is it possible that you could share use one or two examples of SNP entries with that format so that we could catch those cases too. We only tested with bi-allelic sites. It is also worth noticing that multi allelic sites may not work during LoH calling, or at least the current version will bot be able to work. Best Luis
Hi Luis, thank you for your response
Attached are two test vcfs. They have only one identical variant. The only difference is the FORMAT VCF header line ie:
##FORMAT=<ID=VAF,Number=A,Type=Float,Description="Variant allele fractions.">
vs ##FORMAT=<ID=VAF,Number=1,Type=Float,Description="Variant allele fractions.">
Please note that the only variant in the vcfs is a bi-allelic variant, the variants I was feeding into spectre were already filtered to bi-allelic sites.
Also below is the error I get when using the non-modified vcf. Using the modified version allows spectre to run to completion.
Matplotlib created a temporary cache directory at /tmp/matplotlib-7lrdbiqb because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
spectre::2024-04-19 14:44:00,654::INFO::spectre.spectre> Spectre input: CNVCaller --coverage ../mosdepth/test.regions.bed.gz --sample-id test -r ./GRCh38.fasta --output-dir ./test --blacklist ./grch38_blacklist_spectre.bed --metadata ./metadata.mdr --snfj ../snf2json/test.snfj --snv dv16.vcf.gz --only-chr chr1
spectre::2024-04-19 14:44:00,655::INFO::spectre.spectre> Spectre version: 0.2.0
spectre::2024-04-19 14:44:00,655::INFO::spectre.spectre> Starting spectre
spectre::2024-04-19 14:44:00,655::INFO::spectre.spectre> Evaluate if a new .mdr file needs to be created
spectre::2024-04-19 14:44:00,655::INFO::spectre.spectre> Using existing metadata file ./metadata.mdr
spectre::2024-04-19 14:44:00,656::INFO::spectre.spectreCNV> Spectre calculating for: ../mosdepth/test.regions.bed.gz
spectre::2024-04-19 14:44:00,656::INFO::spectre.spectreCNV> Analysing CN neutral state from SNV data
spectre::2024-04-19 14:44:00,658::DEBUG::spectre.analysis.snv_analysis> Starting SNV Copy number state compute ...
spectre::2024-04-19 14:44:00,658::DEBUG::spectre.analysis.snv_analysis> Het sites count per chromosome
spectre::2024-04-19 14:44:00,658::DEBUG::spectre.analysis.snv_analysis> Use chromsomes based on proportion of het sites
spectre::2024-04-19 14:44:00,658::DEBUG::spectre.analysis.snv_analysis> n het sites used for normalization = 0
Traceback (most recent call last):
File "/usr/local/bin/spectre", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/site-packages/spectre/spectre.py", line 637, in main
spectre_run.spectre_exe()
File "/usr/local/lib/python3.12/site-packages/spectre/spectre.py", line 356, in spectre_exe
spectre_main.cnv_call()
File "/usr/local/lib/python3.12/site-packages/spectre/spectreCNV.py", line 71, in cnv_call
self.snv_analysis.snv_copy_number_state()
File "/usr/local/lib/python3.12/site-packages/spectre/analysis/snv_analysis.py", line 231, in snv_copy_number_state
self.snv_multimodal_detect()
File "/usr/local/lib/python3.12/site-packages/spectre/analysis/snv_analysis.py", line 104, in snv_multimodal_detect
self.het_clean() if np.max(self.perfect_het_depth) > 10*np.nanmean(self.perfect_het_depth) else None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 2810, in max
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zero-size array to reduction operation maximum which has no identity
Thanks Jake
Hello Jake, thanks for the detailed information, I will check it out and reach back asap.
Luis
Hello and thank you for releasing this tool.
The code below (which I am assuming is there to filter multi-allelic sites) causes issues when using Deepvariant's VCFs. Deepvariant uses
Number=A
for the format VAF field which causes pysam to always parse it as a tuple, even if it is not a multi-allelic site.I was able to get spectre to run as expected by manually changing the VCF header to
Number=1
for VAFI wonder if you would be open to figuring out another way to filter multi-allelic sites. Maybe something like
Thanks Jake