dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
145 stars 37 forks source link

Outputting homozygous reference #213

Open ghost opened 4 years ago

ghost commented 4 years ago

Hello,

Is there any way to output the homozygous reference bases in the pVCF? Can I have a pVGCF, with one line per base in my reference genome?

thanks

mlin commented 4 years ago

One line per base would lead to impractically large output files for GLnexus' main use cases. There is a proposal under discussion in GA4GH about standardizing a multi-sample GVCF format, which would summarize reference coverage in between variant sites. We are monitoring developments there but it will take some time yet to work its way through that process.

ghost commented 4 years ago

I was asking because it seems GATK joint caller has an "all sites" option. I understand, however, that GLnexus has a strong emphasis on computation efficiency

mlin commented 4 years ago

Yea it's not something GLnexus' main users have requested or would seem likely to use. If you were really dedicated, you could synthesize a GVCF exhibiting a fake variant with good quality metrics at every position and feed that in, causing GLnexus to generate a pVCF site for every position. (I'm not recommending this to be clear -- I think it would work in principle, but there are always unforeseen problems)

ghost commented 4 years ago

Okay, just in case that option might be useful for people calculating mutation rates, as we divide by the genome size, but actually we divide by the number of "callable bases" in the genome, i.e sites that are homozygous but that wouldn't have been filtered out if they hadn't been homozygous.

mlin commented 4 years ago

Thanks -- happy to leave this ticket open for others to +1 or comment

ZexuanZhao commented 1 year ago

This feature will be of importance to calculate some population genomic statistics that are sensitive to total base pair mapped, like Pi.

See here: https://pixy.readthedocs.io/en/latest/generating_invar/generating_invar.html