Open yroskov opened 1 month ago
@thomasstjerne & @mdoering, could you please fix this long standing bug? GSD metadata in the CoL should reflect the version which was synced in the project, but not the version currently imported into the CLB.
(just in case, WFerns GSD was synced 2024-07-09; WPlants - 2024-07-08)
The September edition was released on 2024-09-25.
Ferns were last imported 18th September and in July before that:
The fern sectors were synced last on the 30th September: https://api.checklistbank.org/dataset/3/sector/sync?datasetKey=1140 datasetAttempt: 66 # this is the version of the dataset import: https://api.checklistbank.org/dataset/1140/66.json
Before that on the 9th of July. datasetAttempt: 65 https://api.checklistbank.org/dataset/1140/65.json
The metadata for import 65 indeed looks odd:
"attempt":65,
"issued":"2024-09-18",
"version":"24.9, Sep 2024",
"created":"2024-09-18T14:01:09.739245",
"imported":"2024-07-08T14:10:37.195155"
@yroskov this problem was never mentioned to me before and I am very surprised to see this now. It was working now for more than 2 years.f
The fern sectors were synced last on the 30th September Yes, it is my today's work for CoL of October
this problem was never mentioned to me before and I am very surprised to see this now. It was working now for more than 2 years.f
I raised this many times during our stands up... (especially, in relation to IRMNG)
I believe I know what's going on. If you download the last archives they all lack metadata! That must be linked into wrong archival of metadata versions. I will look more into this tomorrow
I raised this many times during our stands up... (especially, in relation to IRMNG)
Can you point me to an old issue please?
Dataset metadata is only archived during imports, i.e. when no metadata is included in the archive there won't be any archival. And as the dataset metadata version is tied to the import attempt, it requires considerable refactoring to change that. The idea was that we do not want to archive every manual edit that is being done on a dataset, but instead allow manual changes via the UI or API to happen and only write a final version to the archive when a new one, through an import, shows up.
It seems we now rather need an independent metadata versioning system that has its own version number and will be triggered to archive a version when:
Every import and sync would then refer to a specific metadata version which can be retrieved from the archive.
@yroskov @gdower a quick fix from my side is not possible, this will take longer.
Maybe we can add metadata.yaml
files to these sources?
Maybe we can add metadata.yaml files to these sources?
Unfortunately, this can happen with any source. For example, quite often we get a notification about a new ITIS and do an import a few days before the release, without including that update in the release.
...and this happen to almost all GSDs imported by "third parties" out of our control, e.g. WCVP, WFO, Bryonames, all Lepidoptera, etc.
but datasets with metadata in imports are versioned fine, they are not a problem!
Describe the bug
New CoL release of September 2024 contains wrong metadata for World Plants and World Ferns.
Real versions of both GSDs are 19.4, Jun 2024 / 2024-06-30. (Indeed, new data versions were imported in CLB in Spetember 2024, but they were not synced by me in the CoL of September!). However, these incorrect versions (as 24.9, Sep 2024) are shown in GSD metadata in the September release:
https://www.catalogueoflife.org/data/dataset/1140
https://www.catalogueoflife.org/data/dataset/1141