Closed akifoss closed 6 months ago
Hi @akifoss, thanks for the reminder that we are overdue for an update. I will get going on that.
I don't know what GISAID's "consensus" refers to. Have you tried contacting GISAID to ask about it? If they reply, then please update us here! I have tried contacting GISAID regarding updates to pangolin and pangolin-data, but somehow my messages to them seem to vanish into the ether.
Great, looking forward to the updates!
We have sent an email to GISAID but have low hopes in getting a reply - let's see, we'll let you know if we hear back from them.
Hi @akifoss, I also noticed "JN.1" mysteriously appearing in GISAID before it was released. I wonder if the GISAID lineage assignment is using a mixture of released pangolin ("PUSHER-v1.22") and the nightly nextclade build for the "consensus" calls (https://nextstrain.org/staging/nextclade/sars-cov-2)? I'd love to know more about the process if you get a response!
v1.23 pangolin-data and assignment cache are finally out:
https://github.com/cov-lineages/pangolin-data/releases/tag/v1.23 https://github.com/cov-lineages/pangolin-assignment/releases/tag/v1.23
I had a bit of work to do on the tree, and then I discovered a corner-case bug in usher-sampled that only affects some lineage A (early 2020) samples, causing them to be assigned to A. sublineages or B or even some B. sublineages. For some reason with the v1.23 tree it affects more samples than with the previous trees, so that's why I didn't notice the bug before and only see it now. Anyway, if you rerun with v1.23 on all samples ever collected and some might be lineage A, then I recommend using the assignment cache because I computed it with a bugfixed version of usher(-sampled) that hopefully will be released soon as usher v0.6.3.
Thanks again @akifoss for the prod. And if GISAID replies (not holding my breath), I'd love to know what they say! @ktmeaton your guess sounds pretty plausible to me.
Hi, it's me again :) Would be great to get another pangolin-data update, especially since many JN.* lineages are now designated in pango-designation. In general, it would be really useful to have this repo here updated in regular time intervals!
In terms of the GISAID response that we are expecting: no response until now unfortunately..
Hello and happy new year everyone! Is there any info/plan on having a pangolin-data update soon?
Hello and happy new year everyone! Is there any info/plan on having a pangolin-data update soon?
@akifoss it might be worth raising this on the upstream pango-designation repository; the repository here follows the release pattern set there and trains on those releases.
Sorry about the delay. I am back from a nice long winter vacation and will roll out a pango-designation release soon and then get to work on updating pangolin-data. One remaining issue to be resolved before the pango-designation release is the KD.5 retraction request cov-lineages/pango-designation#2405 .
I believe @corneliusroemer maintains a nightly build update of the sars-cov-2 nextclade data package that can be used to call the latest designated lineages if those are what you are most interested in. (IIRC it does not call all of the oldest lineages... pre-Delta maybe?)
For awareness @AngieHinrichs nextclade is likely being run offline, but hasn't been baked into as many 'online' processes that call pangolin at the sequencer for us, so we're supportive of sharing the roles for releases if the current team is receptive. Overall, knowing if there is a regular schedule and anticipated changes to the release schedule can help us rely on the tools!
Sorry about the continued delays. I have found many lineages (mostly old, but still) that are annotated too deeply in the UShER tree so are not being assigned as broadly as they should, and am in the process of fixing those before tagging the next release. Soon, though!
Thanks @GLTOhorsman for pointing out the online/offline factor, that's good to know. If you're offering to help with the releases -- thanks! Currently there is a lot of fairly manual work for me in annotating each new lineage on the UCSC UShER tree, and making a minimized version of the full tree for pangolin to use, and testing before release to find if any major branches are broken or misplaced. Hopefully this year I will get to train up one of my UCSC colleagues to do that as well, and possibly even automate more of the steps and checks.
pangolin-data v1.24 and pangolin-assignment v1.24 were released yesterday. I will try to release updates more frequently in the future.
May I ask when to expect a new pangolin-data update, now that we have the new BA.2.87.1 lineage? Thank you!
Thanks for the reminder @akifoss. I have tagged a new release on pango-designation and will start working on the data update.
Dear pangolin-data team,
there is a new BA.2.86 sublineage available in pango-designation since a couple of weeks now: JN.1. This lineage however is not available for lineage assignment via pangolin-data (last update was 2 months ago). We see SARS-CoV-2 genomes assigned to lineage JN.1 on GISAID with the term "consensus call" in the
pangolin_lineages_version
field, while others have an actual version like "PANGO-v1.22". Would be great to have more frequent updates on pagolin-data as we've had this problem many times over the last few years, for example do a release when a new lineage is assigned, or every week if there has been at least one new lineage assigned since the last release.Many thanks!