cov-lineages / pangolin

Software package for assigning SARS-CoV-2 genome sequences to global lineages.
GNU General Public License v3.0
427 stars 107 forks source link

Regarding Scorpio Call Prioritisation #477

Closed msinno01 closed 2 years ago

msinno01 commented 2 years ago

I wanted to clarify the role of Scorpio in lineage designation using the default ("accurate") mode of pangolin.

My understanding was that the Scorpio call would override the UShER placement call where there is discrepancy, however I can see from some examples that this is not universally the case. For instance I can see a few samples which are called by pangolin as BA.4 with a scorpio call of BA.5, in the note output I can see the following: "scorpio lineage BA.5 conflicts with inference lineage BA.4".

Therefore would you be able to provide some clarification on when the Scorpio call is prioritised over the UShER placement and vice versa?

Many thanks in advance.

msinno01 commented 2 years ago

I think I may have found an answer, am I right in saything that this was updated in June with 4b9dc82?

aineniamh commented 2 years ago

Hey @msinno01, you're right that that used to be the case and that it's just recently been updated. Recently we've switched to not overwriting UShER calls, just pangoLEARN calls, if scorpio conflicts with them (we believe the vast majority UShER calls are giving accurate enough results the don't need to be overwritten by scorpio). PangoLEARN still gets overwritten though, and in cases of conflict will likely output an UNASSIGNED call.

msinno01 commented 2 years ago

Hi @aineniamh, that's great thank you very much for the explanation, much appreciated!