dcgerard / updog

Flexible Genotyping of Polyploids using Next Generation Sequencing Data
https://dcgerard.github.io/updog/
24 stars 8 forks source link

Have updog return phred-scaled genotype likelihoods #23

Open YannDussert opened 2 years ago

YannDussert commented 2 years ago

Hi,

First, thanks for developing updog!

I'd like to use the results from updog with another software, Entropy (https://bitbucket.org/buerklelab/mixedploidy-entropy/), which needs phred-scaled genotype likelihoods (PL), as computed by GATK. Do you have any suggestions on how to do that? Should I just multiply the genotype log-likelihood values from updog by -10?

Best regards, Yann

dcgerard commented 2 years ago

Hey @YannDussert,

Thanks for trying out {updog}!

The log genotype likelihoods from {updog} are actually using the natural log, so you also need to change the base if you want to get to the phred-scale.

Let’s demonstrate how to convert between log (base e) and phred-scaled data.

Generate some points between 0 and 1 for demonstration

p <- ppoints(5)

Calculate their phred-scaled values

phred_p <- -10 * log10(p)

Calculate their log (base e) values

log_p <- log(p)

Function to convert from log (base e) to phred

ln_to_phred <- function(x) {
  x * -10 / log(10)
}

Function to convert from phred to log (base e)

phred_to_ln <- function(x) {
  x / (-10 * log10(exp(1)))
}

Show that the ln_to_phred() is correct:

ln_to_phred(log_p)
## [1] 9.2428 5.0931 3.0103 1.6085 0.5505
phred_p
## [1] 9.2428 5.0931 3.0103 1.6085 0.5505

Show that phred_to_ln() is correct:

phred_to_ln(phred_p)
## [1] -2.1282 -1.1727 -0.6931 -0.3704 -0.1268
log_p
## [1] -2.1282 -1.1727 -0.6931 -0.3704 -0.1268

Let me know if you have any more questions.

Best, David

dcgerard commented 2 years ago

If would be great if {updog} could return the phred-scaled values as an option.

YannDussert commented 2 years ago

Thanks for your quick and clear reply! I should probably have realized on my own that they in natural logs, if I had looked closely at the values of the posterior probabilities.

Best regards, Yann