cov-lineages / pangolin

Software package for assigning SARS-CoV-2 genome sequences to global lineages.
GNU General Public License v3.0
427 stars 107 forks source link

pangolin vs nextclade #516

Closed fancybeakers closed 9 months ago

fancybeakers commented 1 year ago

hi admins, wondering the what is the difference between nextclade and pangolin variant calling. Uploaded one sequence between the two web application but they are pangolin variant calling is different between the two. Nextclade seems to be more updated? Showing a sub-variant, while pangolin variant calling seems outdated still showing the parent lineage. Appreciate your assistance in clarifying the matter

aretchless commented 1 year ago

Hello. I am not involved with either project, but I do pay attention to both. The short answer is that each project has its own update schedule and assignment algorithm, so they are not necessarily in sync. In detail:

  1. cov-lineages: lineage designations start with the 'pango-designations' repository (https://github.com/cov-lineages/pango-designation); these are bundled into 'releases' every once in a while. The releases then propagate to the 'pango-data' repository, which provides the datasets needed by the pangolin software, which assigns the lineages using the Usher algorithm (by default).
  2. nextclade also uses the designations from 'pango-designations', but does not rely on the releases. They may incorporate the new designations before or after pango-data does. They use a different assignment algorithm, described here: https://docs.nextstrain.org/projects/nextclade/en/stable/user/algorithm/nextclade-pango.html
AngieHinrichs commented 1 year ago

Yes, thank you @aretchless, and sorry @fancybeakers about the delayed reply. nextclade does tend to be updated much more frequently. Last week there was a new release (v1.19) of pangolin-data, so after you run

pangolin --update-data

pangolin will be able to assign lineages included in pango-designation v1.19 (lineages designated through 30 March), if you are using the default usher analysis mode. (Due to server issues, the pangoLEARN model files have not been updated since pango-designation v1.18.)

AngieHinrichs commented 1 year ago

Also, if you continue to see unexpected differences between nextclade and pangolin after updating, feel free to send specific examples and we can look into the differences. The full UShER tree at UCSC usually has the latest Pango lineages annotated within a few days of designation, so another way to check is to upload sequences to the UShER web interface and view the resulting subtrees to see what lineage is indicated by each sequence's placement in the full tree.

aineniamh commented 9 months ago

Going to close this issue now as I believe the query has been answered