cov-lineages / pangoLEARN

Store of the trained model for pangolin to access.
GNU General Public License v3.0
55 stars 13 forks source link

Many false positives for AY.39 using usher mode #48

Closed corneliusroemer closed 2 years ago

corneliusroemer commented 2 years ago

Using the newest versions, pango usher mode seems to label things AY.39 it shouldn't.

❯ pangolin --update
pangolin updated to v3.1.16
pangolearn updated to 2021-11-09
constellations updated to v0.0.21
scorpio updated to v0.3.14
pango-designation updated to v1.2.98

The real AY.39 should probably just be the few yellow sequences here: https://nextstrain.org/groups/neherlab/ncov/europe/2021-11-21?branchLabel=aa&c=gt-nuc_27604&f_pango_usher=AY.39 image

@AngieHinrichs

AngieHinrichs commented 2 years ago

Thanks for reporting, @corneliusroemer -- that is worse than I would expect. I know the pangoLEARN release 2021-11-04 had a problem with overcalling AY.39 because of a branch in AY.39 with a back-mutation at 27604, but I thought that should have been fixed in the 2021-11-09 release. Did you run pangolin --update before building https://nextstrain.org/groups/neherlab/ncov/europe/2021-11-21 or after?

Thanks for the link -- I will pick some sequences and take a closer look.

theosanderson commented 2 years ago

At Sanger we have an AY.111 call for BRBR-28F97E9 (unlike the NS version above) so there may be some issue as Angie flags

AngieHinrichs commented 2 years ago

Spot-checking a few of the sequences inappropriately assigned to AY.39 there (USA/MT-MTPHL-3924694/2021, env/Liechtenstein/CeMM15935/2021, Switzerland/VD-ETHZ-34604446/2021), it looks to me like they were assigned to AY.39 with the 2021-11-04 release but not with the 2021-11-09 release (those 3 examples were all assigned to B.1.617.2). So hopefully that will be better with the next build of your tree. Sorry about that and thanks again for reporting it!

(And yes, I also get England/BRBR-28F97E9/2021 assigned to AY.39 with 2021-11-04 but AY.111 with 2021-11-09, thanks Theo.)

corneliusroemer commented 2 years ago

You're right @AngieHinrichs, it looks like the update failed and I used version 2021-11-04. Strange. I'm running pangolin --update automatically in my pipeline but it must have gone wrong.

So I'll rerun the builds and see whether it improves.