CDCgov / cfa-viral-lineage-model

Apache License 2.0
9 stars 0 forks source link

Filter which lineages are modeled #29

Closed thanasibakis closed 2 weeks ago

thanasibakis commented 4 weeks ago

One assumption that we've been hard-coding and need to make configurable is that we want to model all lineages. (... for the July data I'm playing with, the hierarchical model shows pretty clearly that most of the lineages have negligible proportions.)

Originally posted by @afmagee42 in https://github.com/CDCgov/cfa-viral-lineage-model/issues/25#issuecomment-2289042751

afmagee42 commented 3 weeks ago

While we're at it, we should probably be filtering on the lineage assigned being valid. Right now I don't think we're doing any filtering? But in the whole-US, all-time data, I'm seeing:

['23B', '20F', '21K', '20I', 'recombinant', '20E', '21E', '22C', '23G', '23C', '21I', '20H', '21H', '22D', '20G', '22E', '20B', '23H', '21G', '20A', '22F', '23D', None, '20D', '21J', '23A', '22A', '24A', '21B', '21M', '24B', '20J', '23E', '23F', '21L', '21C', '22B', '21D', '21F', '23I', '20C', '19B', '19A', '24C', '21A']

24C hasn't been put into the tree of clades yet, sadly.

None should be removed.

"Recombinant" I'm still not entirely sure what we want to do with, but it's probably best dealt with on a weekly basis.

(NB: added none-removal in #32)

afmagee42 commented 2 weeks ago

Fixed in #45