Open FedeGueli opened 1 year ago
Could be displayed at the bottom next to The sequence data was updated: Last Donnerstag at 7:59 AM
Right now, the GISAID page seems to have the dataset from December, while open has the dataset from January.
Thanks for the good suggestion!
For open, we take both pangoLineage
and nextcladePangoLineage
from Nextstrain. @corneliusroemer How can I find out the dataset version?
For GISAID, something indeed went wrong with updating the Nextclade dataset. I fixed the issue and will reprocess the data.
@FedeGueli, the GISAID version is now displayed in the footer:
@corneliusroemer, please let me know how I can get the info for the open version as well :)
Sorry for the delay @chaoran-chen. This is actually not trivial because we currently don't keep that version in the output (we really should though).
Quick and dirty solution would be to simply use the version returned by:
❯ nextclade dataset list --name sars-cov-2 --json
[
{
"enabled": true,
"attributes": {
"name": {
"isDefault": true,
"value": "sars-cov-2",
"valueFriendly": "SARS-CoV-2"
},
"reference": {
"isDefault": true,
"value": "MN908947",
"valueFriendly": "Wuhan-Hu-1/2019"
},
"tag": {
"isDefault": true,
"value": "2023-02-01T12:00:00Z",
"valueFriendly": null
}
},
...
So [0].attributes.tag.value
This would be incorrect for only a few hours if you run an ingest after release of a new version before we have done the full rerun. So it's possibly good enough as a start. So during your ingest of metadata.tsv, you could just look up the current value and save that with your ingest.
I'm having a look at tracking the dataset within ingest and posting it to a public file for you to use instead. But that may require some reviews from Nextstrain team.
Thanks, Cornelius! The data for the GISAID instance, we exactly take from Nextclade dataset's tag.value
as we run Nextclade ourselves. For open, I'd prefer to be accurate as well and think that it's fine to wait for it to be added to nextstrain/ncov-ingest
A suggestion, it woiuld be nice to have a "nextclade version" to know how long ago it has been updated the last time. So for collections it will be easier to set the right query (manual from pango page or automatic from nextcalde)