GenSpectrum / cov-spectrum-website

A web platform to detect and analyze variants of SARS-CoV-2
https://cov-spectrum.org
GNU General Public License v3.0
60 stars 14 forks source link

ENH: Collections: Displaying when the last nextclade version has been updated. #670

Open FedeGueli opened 1 year ago

FedeGueli commented 1 year ago

A suggestion, it woiuld be nice to have a "nextclade version" to know how long ago it has been updated the last time. So for collections it will be easier to set the right query (manual from pango page or automatic from nextcalde)

corneliusroemer commented 1 year ago

Could be displayed at the bottom next to The sequence data was updated: Last Donnerstag at 7:59 AM

Right now, the GISAID page seems to have the dataset from December, while open has the dataset from January.

chaoran-chen commented 1 year ago

Thanks for the good suggestion!

For open, we take both pangoLineage and nextcladePangoLineage from Nextstrain. @corneliusroemer How can I find out the dataset version?

For GISAID, something indeed went wrong with updating the Nextclade dataset. I fixed the issue and will reprocess the data.

chaoran-chen commented 1 year ago

@FedeGueli, the GISAID version is now displayed in the footer:

image

@corneliusroemer, please let me know how I can get the info for the open version as well :)

corneliusroemer commented 1 year ago

Sorry for the delay @chaoran-chen. This is actually not trivial because we currently don't keep that version in the output (we really should though).

Quick and dirty solution would be to simply use the version returned by:

❯ nextclade dataset list --name sars-cov-2 --json
[
  {
    "enabled": true,
    "attributes": {
      "name": {
        "isDefault": true,
        "value": "sars-cov-2",
        "valueFriendly": "SARS-CoV-2"
      },
      "reference": {
        "isDefault": true,
        "value": "MN908947",
        "valueFriendly": "Wuhan-Hu-1/2019"
      },
      "tag": {
        "isDefault": true,
        "value": "2023-02-01T12:00:00Z",
        "valueFriendly": null
      }
    },
    ...

So [0].attributes.tag.value

This would be incorrect for only a few hours if you run an ingest after release of a new version before we have done the full rerun. So it's possibly good enough as a start. So during your ingest of metadata.tsv, you could just look up the current value and save that with your ingest.

I'm having a look at tracking the dataset within ingest and posting it to a public file for you to use instead. But that may require some reviews from Nextstrain team.

chaoran-chen commented 1 year ago

Thanks, Cornelius! The data for the GISAID instance, we exactly take from Nextclade dataset's tag.value as we run Nextclade ourselves. For open, I'd prefer to be accurate as well and think that it's fine to wait for it to be added to nextstrain/ncov-ingest