cov-lineages / pangolin-data

Repository for storing latest model, protobuf, designation hash and alias files for pangolin assignments
GNU General Public License v3.0
27 stars 2 forks source link

Updates on pango_designation still not included in pangolin_data #58

Closed juanledesma78 closed 1 month ago

juanledesma78 commented 1 month ago

Hi, I hope you are doing well. We have noticed that there have been some recent changes in the designation for some lineages (https://github.com/cov-lineages/pango-designation) but were wondering when the next release of pangolin_data is planned to be, as the latest was on April 30th.

We have seen GISAID is already including some of changes (i.e. LD.1, LE.1, KP.1.1.2...) among the lineage list they are using, even though these new designations have not been released in pangolin_data yet and I was wondering the reason why these modifications are already taking in place in GISAID.

We would be very interested in trying to generate a pangolin_data locally so we could test the lineage assignment on our data with some latest changes before you guys release the new pangolin_data. Is there any script or even input files (i.e. global alignment, newick and vcf) used to get the lineageTree.pb and lineages.hash.csv you could share for us to try it?

Many thanks in advance Juan

akifoss commented 1 month ago

Indeed, it would be really useful to have a pangolin-data update that includes lineages designated lately, such as KP.3.1.1 which seems to have the fastest growth advantage right now compared to JN.1: https://github.com/MurrellGroup/lineages/blob/main/plots/all_lineages_MCMC_lineage_growths.svg Any help towards this direction is highly appreciated!

AngieHinrichs commented 1 month ago

Sorry about the delay, I will make a new release ASAP.

We would be very interested in trying to generate a pangolin_data locally so we could test the lineage assignment on our data with some latest changes before you guys release the new pangolin_data. Is there any script or even input files (i.e. global alignment, newick and vcf) used to get the lineageTree.pb and lineages.hash.csv you could share for us to try it?

If you have only a few hundred sequences then you can try uploading them to usher.bio to get assignments (there is a downloadable .tsv file with pango_lineage column). But if you want to automate a flow and/or have thousands of sequences then you could try placing your sequences in the full UShER tree, which I can share privately with registered users of GISAID (email angie at soe dot ucsc dot edu).

Alternatively, nextclade has a command-line version and a nightly build dataset which @corneliusroemer can point to if you're interested; nextclade can't assign lineages from very early in the pandemic, but does a great job with Omicron lineages and is updated more often than pangolin-data (and maintained by the lineage designator himself :).

these new designations have not been released in pangolin_data yet and I was wondering the reason why these modifications are already taking in place in GISAID.

As far as I know, several people have asked GISAID how lineages are assigned, but have not received a response. There has been some speculation that GISAID may be using nextclade. If you ask and receive a response, please update here!

juanledesma78 commented 1 month ago

Hi Angie, many thanks for your commets. We are actually using both pangolin and netxclade command line versions integrated into our pipeline, so it would be very helpful for us to try the full UShER tree to test our sequences. I am registered on GISAID and will drop you an email. Cheers!

AngieHinrichs commented 1 month ago

pangolin-data v1.28 is now available (pangolin --update-data). Looking forward to hearing from you @juanledesma78.

juanledesma78 commented 1 month ago

Brilliant, many thanks for it, I really appreciate it