CoVaRR-NET / duotang

Scripts and data for the CoVaRR-Net Pillar 6 notebook
https://covarr-net.github.io/duotang/duotang.html
MIT License
1 stars 2 forks source link

Substitution rate error #257

Closed spotto closed 1 year ago

spotto commented 1 year ago

Github issue: There is something wrong with the scale of duotang’s “Molecular clock estimates”, which suggests ~1500 substitutions per genome per day. That’s way too high. It should be closer to 0.06 substitutions per genome per day (about one change every two weeks). Did this accidentally get multiplied by the genome size? <@U02NQPKG3UJ> <@U03VC9B7PC3>

Slack Message

ArtPoon commented 1 year ago

Ok I'll check it out

ArtPoon commented 1 year ago

I suspect the problem is that when I replaced the RTT plot with an interactive JS widget, the code for calculating clock rates did not get updated. It is probably pulling raw DOM coordinates instead of the divergence (y-axis) values.

ArtPoon commented 1 year ago

Ok this was a pretty easy fix. @spotto 's intuition was correct - for some reason rescaling the rate estimates by genome length became superfluous, so I am removing this from the dev branch:

diff --git a/scripts/fit-rtt.R b/scripts/fit-rtt.R
index a5e539f5..ec012382 100644
--- a/scripts/fit-rtt.R
+++ b/scripts/fit-rtt.R
@@ -80,9 +80,9 @@ get.ci <- function(fits) {
   ci <- lapply(fits, confint.rlm)
   est <- data.frame(
     n = sapply(fits, function(f) nrow(f$x)),                            
-    est = 29903*sapply(fits, function(f) f$coef[2]),
-    lower.95 = 29903*sapply(ci, function(f) f[2,1]),
-    upper.95 = 29903*sapply(ci, function(f) f[2,2])
+    est = sapply(fits, function(f) f$coef[2]),
+    lower.95 = sapply(ci, function(f) f[2,1]),
+    upper.95 = sapply(ci, function(f) f[2,2])
   )
   est$Lineage <- row.names(est)
   est
bfjia commented 1 year ago

Thank you Art