PlantandFoodResearch / MCHap

Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
MIT License
18 stars 3 forks source link

Sparse representation of genotype posterior #182

Open timothymillar opened 3 months ago

timothymillar commented 3 months ago

Investigate a sparse encoding of genotype posteriors. E.g. an equivalent to the PP field (phred-scaled probabilities) in which zeoro values are omitted. This can be represented as a map of genotype index to non-zero phred scaled probabilities. This effectively removes genotypes with probabilities <= 0.1. An example may look like "0=10,2=3,7=1" and have the String type in VCF.

timothymillar commented 3 months ago

We could also use a sparse equivalent of GP if we specify a minimum posterior probability to report. E.g., >= 0.01 would work well with MCMC approximations. Alternatively, we could report a phred score of 0 for non-zero values, but this is confusing.