cov-lineages / pangolin-data

Repository for storing latest model, protobuf, designation hash and alias files for pangolin assignments
GNU General Public License v3.0
27 stars 2 forks source link

updates of UShER models/pangolin-data when new lineages are assigned by pango-designation #46

Closed akifoss closed 6 months ago

akifoss commented 9 months ago

Dear pangolin-data team,

there is a new BA.2.86 sublineage available in pango-designation since a couple of weeks now: JN.1. This lineage however is not available for lineage assignment via pangolin-data (last update was 2 months ago). We see SARS-CoV-2 genomes assigned to lineage JN.1 on GISAID with the term "consensus call" in the pangolin_lineages_version field, while others have an actual version like "PANGO-v1.22". Would be great to have more frequent updates on pagolin-data as we've had this problem many times over the last few years, for example do a release when a new lineage is assigned, or every week if there has been at least one new lineage assigned since the last release.

Many thanks!

AngieHinrichs commented 9 months ago

Hi @akifoss, thanks for the reminder that we are overdue for an update. I will get going on that.

I don't know what GISAID's "consensus" refers to. Have you tried contacting GISAID to ask about it? If they reply, then please update us here! I have tried contacting GISAID regarding updates to pangolin and pangolin-data, but somehow my messages to them seem to vanish into the ether.

akifoss commented 9 months ago

Great, looking forward to the updates!

We have sent an email to GISAID but have low hopes in getting a reply - let's see, we'll let you know if we hear back from them.

ktmeaton commented 9 months ago

Hi @akifoss, I also noticed "JN.1" mysteriously appearing in GISAID before it was released. I wonder if the GISAID lineage assignment is using a mixture of released pangolin ("PUSHER-v1.22") and the nightly nextclade build for the "consensus" calls (https://nextstrain.org/staging/nextclade/sars-cov-2)? I'd love to know more about the process if you get a response!

AngieHinrichs commented 9 months ago

v1.23 pangolin-data and assignment cache are finally out:

https://github.com/cov-lineages/pangolin-data/releases/tag/v1.23 https://github.com/cov-lineages/pangolin-assignment/releases/tag/v1.23

I had a bit of work to do on the tree, and then I discovered a corner-case bug in usher-sampled that only affects some lineage A (early 2020) samples, causing them to be assigned to A. sublineages or B or even some B. sublineages. For some reason with the v1.23 tree it affects more samples than with the previous trees, so that's why I didn't notice the bug before and only see it now. Anyway, if you rerun with v1.23 on all samples ever collected and some might be lineage A, then I recommend using the assignment cache because I computed it with a bugfixed version of usher(-sampled) that hopefully will be released soon as usher v0.6.3.

Thanks again @akifoss for the prod. And if GISAID replies (not holding my breath), I'd love to know what they say! @ktmeaton your guess sounds pretty plausible to me.

akifoss commented 7 months ago

Hi, it's me again :) Would be great to get another pangolin-data update, especially since many JN.* lineages are now designated in pango-designation. In general, it would be really useful to have this repo here updated in regular time intervals!

In terms of the GISAID response that we are expecting: no response until now unfortunately..

akifoss commented 6 months ago

Hello and happy new year everyone! Is there any info/plan on having a pangolin-data update soon?

GLTOhorsman commented 6 months ago

Hello and happy new year everyone! Is there any info/plan on having a pangolin-data update soon?

@akifoss it might be worth raising this on the upstream pango-designation repository; the repository here follows the release pattern set there and trains on those releases.

AngieHinrichs commented 6 months ago

Sorry about the delay. I am back from a nice long winter vacation and will roll out a pango-designation release soon and then get to work on updating pangolin-data. One remaining issue to be resolved before the pango-designation release is the KD.5 retraction request cov-lineages/pango-designation#2405 .

I believe @corneliusroemer maintains a nightly build update of the sars-cov-2 nextclade data package that can be used to call the latest designated lineages if those are what you are most interested in. (IIRC it does not call all of the oldest lineages... pre-Delta maybe?)

GLTOhorsman commented 6 months ago

For awareness @AngieHinrichs nextclade is likely being run offline, but hasn't been baked into as many 'online' processes that call pangolin at the sequencer for us, so we're supportive of sharing the roles for releases if the current team is receptive. Overall, knowing if there is a regular schedule and anticipated changes to the release schedule can help us rely on the tools!

AngieHinrichs commented 6 months ago

Sorry about the continued delays. I have found many lineages (mostly old, but still) that are annotated too deeply in the UShER tree so are not being assigned as broadly as they should, and am in the process of fixing those before tagging the next release. Soon, though!

Thanks @GLTOhorsman for pointing out the online/offline factor, that's good to know. If you're offering to help with the releases -- thanks! Currently there is a lot of fairly manual work for me in annotating each new lineage on the UCSC UShER tree, and making a minimized version of the full tree for pangolin to use, and testing before release to find if any major branches are broken or misplaced. Hopefully this year I will get to train up one of my UCSC colleagues to do that as well, and possibly even automate more of the steps and checks.

AngieHinrichs commented 6 months ago

pangolin-data v1.24 and pangolin-assignment v1.24 were released yesterday. I will try to release updates more frequently in the future.

akifoss commented 5 months ago

May I ask when to expect a new pangolin-data update, now that we have the new BA.2.87.1 lineage? Thank you!

AngieHinrichs commented 5 months ago

Thanks for the reminder @akifoss. I have tagged a new release on pango-designation and will start working on the data update.