PoonLab / covizu

Rapid analysis and visualization of coronavirus genome variation
https://filogeneti.ca/CoVizu/
MIT License
45 stars 20 forks source link

Model is predicting low numbers for nearly all lineages #497

Closed ArtPoon closed 3 months ago

ArtPoon commented 8 months ago
ArtPoon commented 8 months ago

Looks like the molecular clock model residuals are off as well.

ArtPoon commented 7 months ago

Default display has been switched back to divergence by @GopiGugan

ArtPoon commented 5 months ago
ArtPoon commented 5 months ago
ArtPoon commented 5 months ago

new branch iss497 56a0786d30e18aeda59d2ba4ba1d6da16e89b0a6

ArtPoon commented 5 months ago

I'll try to wrap up some of this refactoring and then I may need to hand this off to someone else

ArtPoon commented 4 months ago

@GopiGugan can you please retrieve by_lineages from the database and run it through the make_beadplots function, and send me a CSV of the summary stats and predicted number of infections for each lineage?

ArtPoon commented 3 months ago

Obtained CSV from @GopiGugan, will analyze

ArtPoon commented 3 months ago

Summary stats seem reasonable:

Scatterplot of predicted number of infections (from HUNePi model) against sample size (number of sequences):

ArtPoon commented 3 months ago

The issue seems to be that there is a small number of outlier lineages with very high predicted numbers of infections. For the diagnostic summary_stats.csv data that @GopiGugan sent me, the maximum predicted number is about 4.4 million, but most of the predicted numbers fall within the range of 10 to 100,000. On a linear scale, this causes most lineages to be coloured purple/blue: image

ArtPoon commented 3 months ago

A log-transform on mapping predicted numbers of infections to the colour scale should resolve this on the front end

ArtPoon commented 3 months ago

Ok this is fixed, tree colours look much better: image