SciCrunch / sparc-curation

code and files for SPARC curation workflows
MIT License
14 stars 12 forks source link

stale doi issue -> sparcron needs to add logic to check discover for publication events #91

Open tgbugs opened 1 year ago

tgbugs commented 1 year ago

I currently do not embed non-resolving dois in the export. However, because the dataset modified date does not get bumped when it is published (as it should) we never detect that there was a change. This means we probably need to start embedding non-resolving dois. There is some old logic lurking that the presence of a doi means that we can assume that it has resolved at least once and that a dataset has actually been published. There are a bunch of these places scattered around.

This also means that we need to add an additional check during our combination step to see whether a dataset has actually been published before we place it in the curation-export-published.ttl pile.

Further it means that we will no longer be able to determine publication status reliably using the json export.

I think this means that the discover database is actually the source of truth for this information, so I think I can set up a way to rerun a dataset export without fetching any new data that would allow us to run the export again when a publication happens. Then we wouldn't need to change the way we currently handle dois. This seems the safes option.

jgrethe commented 1 year ago

'dataset modified date' tracks changes that have happened to the data itself. Not sure a publication event counts there. For example, there may be many changes to the publication status with no changes to the data. Then there is no reliable timestamp for the update to the data itself.

tgbugs commented 1 year ago

A related issue here is that the -published files do not updated as expected.