dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
142 stars 37 forks source link

Support for Octopus VCF files #235

Open DiDeoxy opened 3 years ago

DiDeoxy commented 3 years ago

Octopus has a unique way of reporting alleles when the start at the same position on the reference but end at different positions. They indicate this with a * character. You can read more about it here: https://github.com/luntergroup/octopus/wiki/VCF-format

Would it be possible to support this?

DiDeoxy commented 3 years ago

This is the correct way according to the VCF 4.3 spec.

Edit: Ignore this, I am wrong.

mlin commented 3 years ago

The star allele is a "hot topic" right now, at least as these things go. In fact we do have a medium-term plan to use it more. Further info: https://github.com/dnanexus-rnd/GLnexus/issues/210 https://github.com/samtools/hts-specs/issues/437 https://github.com/samtools/hts-specs/pull/464

DiDeoxy commented 3 years ago

Is there a workaround currently? If I wanted to re-format the vcf to comply with current GLnexus spec how would I do that?

mlin commented 3 years ago

I'm not aware of any real attempts to adapt/validate GLnexus for merging Octopus [g]VCF, sorry. There are always a lot of details to get right; even some of the current configurations built into GLnexus are incomplete in various ways. I'd like it to happen for sure but it's a challenge to prioritize without a driver from the project's sponsors. I wonder if @dancooke has any thoughts or alternative plans wrt big cohort calling with Octopus. Maybe we'll get to work together on it one day.

BTW I realized the YAML indentation problem you hit was actually copied from our own configuration wiki page, which I fixed. Thanks for that.

DiDeoxy commented 3 years ago

Hey, I managed to reformat the Octopus gvcfs using pysam using a script by @dancooke. I was then able to call using my current configuration file #234 .

What would be some details I need to investigate to make sure GLnexus is calling correctly?