Garden-AI / garden

https://garden-ai.readthedocs.io
MIT License
17 stars 4 forks source link

re-publish and/or modify an entrypoint without committing DataCite Sins #396

Open OwenPriceSkelly opened 7 months ago

OwenPriceSkelly commented 7 months ago

DOI commandments

Here are two things about DOIs DataCite says we should be able to assume:

Here are some easy ways for users to break these assumptions with entrypoints:

Same entrypoint, different DOIs

Suppose I'm a user who has a) already published some entrypoint to my garden and b) wants to publish the same entrypoint to another garden. To do this, I leave everything in my notebook exactly the same, except I update @garden_entrypoint(..., garden_doi="10.23677/my-garden-doi") to garden_doi="10.23677/my-other-garden-doi". I naively run notebook publish, and a new, unrelated DOI is minted for an identical entrypoint.

Updated entrypoint, unrelated DOIs (the "versioning problem")

A related problem crops up if I want to modify my entrypoint but still publish to the same garden -- naively running notebook publish after making my changes will get an error telling me that an entrypoint already exists with that name in that garden. So I remove the old version manually from my garden using the CLI. Now I can republish without any errors, but again this mints an unrelated DOI for an updated entrypoint as if it were new.

This is a problem because the old DOI/entrypoint still exists as an "orphan", since no gardens refer to it. Best practices here have some wiggle room, we can either a) reuse the old DOI or b) mint a new DOI with some "versionOf" or "obsoletes" metadata field referencing the original so it isn't orphaned.

Different entrypoints, same DOI (uh oh)

In the current system, I might manually reuse my DOI in an attempt to work around the versioning problem by pinning the DOI. Not only is this clunky UX -- I need to publish the notebook just to get the DOI, then I need to edit the notebook to pin it in my EntrypointMetadata -- but I might accidentally commit a cardinal DOI sin, like so:

  1. I publish an entrypoint to garden A
  2. I pin the DOI "10.23677/my-only-doi" in good faith
  3. I realize my entrypoint would make a good fit for garden B, if I tweak it just a little (say by changing the signature of the function). It does not occur to me to un-pin my DOI
  4. I edit the entrypoint and publish it to garden B

Now garden A and garden B both have an entrypoint under "10.23677/my-only-doi", but that DOI corresponds to a totally different function in garden A vs garden B. Drat.