GWASpoly uses the DS field for posterior mean dosage. This is essentially posterior mean allele frequency * ploidy. However, the (informal standard) DS field doesn't report values for the reference allele. This is fine when reported for the full posterior distribution because the dose of the reference allele can be imputed as ploidy - sum(alts). But this is an issue for mchap assemble because we don't necessarily report all alleles (excluded infrequent alleles) resulting in a concatenated posterior distribution. This means the reference allele dosage can't be imputed. More importantly, the dose of the alternate alleles can't be normalized without the reference allele value. Use of the results from mchap assemble without normalization may bias downstream analysis.
There are a few options:
Make DS an option for machap call but not mchap assemble because mchap call always reports the full posterior
Normalise dosage for reported alleles in mchap assemble before discarding the reference allele: the resulting values will no-longer match the AFP field
Break the 'standard' and report the reference allele value: may cause downstream issues
Don't make DS an option: may loose ease of compatibility with down stream tools
Related to #103
GWASpoly uses the DS field for posterior mean dosage. This is essentially posterior mean allele frequency * ploidy. However, the (informal standard) DS field doesn't report values for the reference allele. This is fine when reported for the full posterior distribution because the dose of the reference allele can be imputed as
ploidy - sum(alts)
. But this is an issue formchap assemble
because we don't necessarily report all alleles (excluded infrequent alleles) resulting in a concatenated posterior distribution. This means the reference allele dosage can't be imputed. More importantly, the dose of the alternate alleles can't be normalized without the reference allele value. Use of the results frommchap assemble
without normalization may bias downstream analysis.There are a few options:
machap call
but notmchap assemble
becausemchap call
always reports the full posteriormchap assemble
before discarding the reference allele: the resulting values will no-longer match the AFP fieldThe first option is probably the best option