subsample sequences by Pango lineage?

blab / ncov-escape

Nextstrain build for SARS-CoV-2 to calculate immune escape of circulating viruses

https://nextstrain.org/ncov

MIT License

0 stars 0 forks source link

subsample sequences by Pango lineage? #6

Closed jbloom closed 2 years ago

jbloom commented 2 years ago

Right now sequences are sampled roughly in a way that is proportional to abundance. Maybe this is best way to do things, but it may mean it takes a long time to see new antigenic variants arising. Would somehow sampling sequences by Pango classification be better?

trvrb commented 2 years ago

I like this idea. I've implemented subsampling by Pango lineage in 4d6d7b7500122cf24f1480403a4d1e2a158acfba. I think emphasizing diversity to start with will be helpful.