cov-lineages / pangoLEARN

Store of the trained model for pangolin to access.
GNU General Public License v3.0
55 stars 13 forks source link

Release 2022-01-05 produces generic lineage for Omicron #63

Closed quaeler closed 2 years ago

quaeler commented 2 years ago

The lineage call on presumed Omicron variants is now being reported as only B as seen in:

RPBBDBKGSXKJSEQQAIF/ARTIC/nanopolish_MN908947.3,B,,,,,,PANGO-v1.2.121,3.1.17,2022-01-05,v1.2.121,passed_qc,Assigned from designation hash.

However this is not sufficiently specific but reporting requirements at PHE. The 2021-12-06 release produced sufficiently specific results. This wouldn't be an issue since 2022-01-05 isn't "released" and doing a --update-date from Pangolin doesn't attempt to grab it, however checking out the last release of Pangolin (v3.1.17) by tag from GitHub and then doing the conda-and-pip installation of it locally goes out and grabs 2022-01-05 for whatever reason.

quaeler commented 2 years ago

The output running with 2021-12-06 against the same consensus FASTA:

RPBBDBKGSXKJSEQQAIF/ARTIC/nanopolish_MN908947.3,B.1.1.529,,,Omicron (B.1.1.529-like),0.000000,1.000000,PANGO-v1.2.105,3.1.17,2021-12-06,v1.2.105,passed_qc,scorpio call: Alt alleles 0; Ref alleles 39; Amb alleles 0; Oth alleles 0; scorpio r
eplaced lineage assignment B
corneliusroemer commented 2 years ago

Interestingly it says Assigned from designation hash which means that this sequence must be present in the designations as B.

Do you have the fasta so one can try to reproduce?

quaeler commented 2 years ago

That being said, there are strictly B being produced which do not have that note, for example:

RPB2EHQ65IBMM53SDXJ/ARTIC/nanopolish_MN908947.3,B,0.0,0.9435590969455512,,,,PLEARN-v1.2.121,3.1.17,2022-01-05,v1.2.121,passed_qc,
quaeler commented 2 years ago

... and that same result using 2021-12-06:

RPB2EHQ65IBMM53SDXJ/ARTIC/nanopolish_MN908947.3,B.1.1.529,0.0,0.9432558139534883,Omicron (B.1.1.529-like),0.025600,0.923100,PLEARN-v1.2.105,3.1.17,2021-12-06,v1.2.105,passed_qc,scorpio call: Alt alleles 1; Ref alleles 36; Amb alleles 2; Ot
h alleles 0; scorpio replaced lineage assignment B
rmcolq commented 2 years ago

First thing, I don't think this is an omicron. If it has 1 alt (omicron defining) allele and 36 ref alleles, it should not be classified as omicron. I'm still trying to work out why you got a B.1.1.529 call before with these counts

rambaut commented 2 years ago

Could this be the result of using the new constellation files with an older scorpio (i.e., without the parent logic)?

rambaut commented 2 years ago

Incidentally - this fasta file is the reference genome MN908947.3 so pure B is the correct identification

rmcolq commented 2 years ago

I've not been able to replicate the scorpio call. e.g. with older pangoLEARN release I got this:

taxon,lineage,conflict,ambiguity_score,scorpio_call,scorpio_support,scorpio_conflict,version,pangolin_version,pangoLEARN_version,pango_version,status,note
RPBBDBKGSXKJSEQQAIF/ARTIC/nanopolish_MN908947.3,B,,,,,,PANGO-v1.2.105,3.1.17,2021-12-06,v1.2.105,passed_qc,Assigned from designation hash.

Are you sure that your sequence hasn't changed between runs or something?

rmcolq commented 2 years ago

I'm going to close this issue as I don't think there is actually a problem (it really is a B lineage sequence provided) and I am unable to replicate the problematic output. Feel free to comment if a problem persists on your side with some more information about how to replicate (and importantly FASTAs)

quaeler commented 2 years ago

Ok - thanks for the review. A clarifying question on what @rambaut wrote:

Could this be the result of using the new constellation files with an older scorpio (i.e., without the parent logic)?

Does this imply that were we running the executables associated to the 3.1.17 Pango release, as bundled around that date (so, for example, in a docker container), but are performing an --update-data with those binaries, that incorrect results or broken functionality may result?

rmcolq commented 2 years ago

There have been some updates to the format of constellations files and corresponding updates to scorpio over recent months, but I'm pretty sure getting odd combinations of them would just result in error messages/failed runs. I honestly can't think of a way to get the result you had.

Sent from my Galaxy

-------- Original message -------- From: loki der quaeler @.> Date: 14/01/2022 16:04 (GMT+00:00) To: cov-lineages/pangoLEARN @.> Cc: COLQUHOUN Rachel @.>, State change @.> Subject: Re: [cov-lineages/pangoLEARN] Release 2022-01-05 produces generic lineage for Omicron (Issue #63)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

Ok - thanks for the review. A clarifying question on what @rambauthttps://github.com/rambaut wrote:

Could this be the result of using the new constellation files with an older scorpio (i.e., without the parent logic)?

Does this imply that were we running the executables associated to the 3.1.17 Pango release, as bundled around that date (so, for example, in a docker container), but are performing an --update-data with those binaries, that incorrect results or broken functionality may result?

— Reply to this email directly, view it on GitHubhttps://github.com/cov-lineages/pangoLEARN/issues/63#issuecomment-1013252614, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACLIWOY6KDA4DXCDV4TCCRLUWBCQRANCNFSM5L4YV2IQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you modified the open/close state.Message ID: @.***>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

quaeler commented 2 years ago

Just for reference, to functionally replicate the B.1.1.529 call: