andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
101 stars 29 forks source link

Inconsistent curated lineages naming #246

Closed gyebra-phs closed 1 month ago

gyebra-phs commented 1 month ago

Hi all,

First of all, thanks for all your hard work into making this great tool available.

I wanted to mention that there's a couple of historical inconsistencies in the nomenclature of curated lineages within Freyja-data.

Specifically, the category that was labelled as BA.2.86* (BA.2.86X) in the historical curated lineages files is labelled BA.2.86* (BA.2.86.X) since June 2024. The opposite happens with XBB.2.3* Omicron (XBB.2.3.X), which is labelled as XBB.2.3* (XBB.2.3X) since October 2023.

I appreciate that these might be intentional updates but when Freyja uses the historical data to compare to the latest version of the curated lineages I end up with the 4 categories (BA.2.86X, BA.2.86.X, XBB.2.3X, and XBB.2.3.X) in my final aggregated results.

Even though changing files retrospectively might not be advisable, is there any chance that the naming is made consistent to avoid having extra categories that should go together? If not, is there any option to deactivate the comparison with historical data in case I wanted to only keep the currently relevant curated lineages?

Many thanks for your help!

Gonzalo

joshuailevy commented 1 month ago

Hey @gyebra-phs,

Ah, thanks for noticing that. That's just a typo, those are all supposed to be named with the ".X" ending. We pull that information from outbreak.info (https://github.com/outbreak-info/outbreak.info/blob/master/curated_reports_prep/curated_lineages.yaml) and it looks like there was a manual entry that didn't quite conform to specifications. I've now updated this, sorry for the confusion.

This is just the default behavior - if you want to customize this to a particular set of lineages of interest, you can do that with the prepLineageDict function. Here's an example: https://github.com/joshuailevy/sd_ww_processing/blob/main/polish_outputs.py

Best, Josh

gyebra-phs commented 1 month ago

Hi Josh,

Many thanks for your reply and for updating the naming, I appreciate it!

I'll give it a try and play around with the prepLineageDict function, thanks for the suggestion. However, just to point out the link to the polish_outputs.py script doesn't work for me, maybe it's set as private?

In any case, thanks again for your help!

Gonzalo

joshuailevy commented 1 month ago

Ah, sorry about that - forgot that it's a private repo... just made it public.

Josh