brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
255 stars 35 forks source link

Feature Request: Support gVCF #27

Closed holtjma closed 4 years ago

holtjma commented 4 years ago

I've been able to use somalier without issue on Dragen VCF files, but I've encountered a nondescript error when trying on the Dragen gVCF file for the same sample. Here's the message I'm getting:

[XXX@XXX ~]$ somalier extract -s sites.hg38.vcf.gz -f hg38.fa -d ~ [MASKED].hard-filtered.gvcf.gz
somalier version: 0.2.3
SIGABRT: Abnormal termination.

Not sure if this is a bug or a feature request.

brentp commented 4 years ago

I don't suppose you'd be able to share an example gvcf? Meanwhile, I'll have a look as this is not the first time this has been requested and it seems a sensible input to somalier.

holtjma commented 4 years ago

So I think I found a (rather stupid) workaround. Simply renaming the file (via softlink) to [MASKED].hard-filtered.vcf.gz enabled somalier to run and it appear to get an identical answer to if I actually convert the gVCF to a VCF (through gvcftools-0.16/bin/extract_variants). That workaround unblocks me from testing on the gVCFs, though.

If you still want to see a gVCF file, I might be able to get approval to share one. Let me know, and I can look into it.

brentp commented 4 years ago

I am surprised that works. It's probably treating any reference call block as unknown. I am working on this now, so next release will have proper gvcf support.

brentp commented 4 years ago

hi, I have added this for the next release. if you have a chance, would be great if you could try it out. I get nearly identical results (<1% difference) between somalier from cram and somalier from GVCF.

Here is the updated binary: somalier.gz

thanks for bringing this up. I think it's a nice addition for somalier. I'll do more testing on my end as well, before next release.

holtjma commented 4 years ago

Below is the output I got from running it. I got that same error for the original gVCF and the soft-linked "VCF".

somalier version: 0.2.4
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
brentp commented 4 years ago

oh shoot. just made the release. can you run with this debug binary and let me know the output (along with the full command-line that you are running?)

somalier_debug.gz

thanks in advance.

holtjma commented 4 years ago

Yep, not a problem (obviously masked out some paths & such):

[XXX@XXX somalier-0.2.4]$ somalier_debug extract -s [MASK]/somalier-0.2.3/sites_files/sites.hg38.vcf.gz -f hg38.fa -d ~/interrupt [MASK].gvcf.gz
somalier version: 0.2.4
Traceback (most recent call last)
/home/brentp/src/somalier/src/somalier.nim(212) somalier
/home/brentp/src/somalier/src/somalier.nim(198) main
/home/brentp/src/somalier/src/somalier.nim(157) extract_main
/home/brentp/src/somalier/src/somalier.nim(61) get_ref_alt_counts
/home/brentp/src/somalier/src/somalier.nim(31) looks_like_gvcf_variant
/root/.nimble/pkgs/hts-#head/hts/vcf.nim(738) ALT
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
brentp commented 4 years ago

ok. you must have some variants with an empty alt field? can you share the content of a single variant without an alt?

holtjma commented 4 years ago

just sent you an email with an example gVCF, hopefully that will help

brentp commented 4 years ago

I am able to run that gvcf with the changes added since last release. if you would try out this binary on a few samples, that would be a great help (in addition to the gvcf you already provided). thanks very much for following up! somalier.gz

holtjma commented 4 years ago

Seems like that's working and generally finding more sites (~17k vs. ~10k) which I'm guessing should be expected. I'm guessing this will be rolled into 0.2.5 when it's released?

brentp commented 4 years ago

yes, what I sent will be next release. i'll release this week. waiting to see if any new edge-cases like this one shake out since 0.2.4

brentp commented 4 years ago

this is out here: https://github.com/brentp/somalier/releases/tag/v0.2.5

thanks again.