Here are two things about DOIs DataCite says we should be able to assume:
equivalent digital objects should not have different/unrelated DOIs
digital objects with the same DOI should not be meaningfully different
Here are some easy ways for users to break these assumptions with entrypoints:
Same entrypoint, different DOIs
Suppose I'm a user who has a) already published some entrypoint to my garden and b) wants to publish the same entrypoint to another garden. To do this, I leave everything in my notebook exactly the same, except I update @garden_entrypoint(..., garden_doi="10.23677/my-garden-doi") to garden_doi="10.23677/my-other-garden-doi". I naively run notebook publish, and a new, unrelated DOI is minted for an identical entrypoint.
Updated entrypoint, unrelated DOIs (the "versioning problem")
A related problem crops up if I want to modify my entrypoint but still publish to the same garden -- naively running notebook publish after making my changes will get an error telling me that an entrypoint already exists with that name in that garden. So I remove the old version manually from my garden using the CLI. Now I can republish without any errors, but again this mints an unrelated DOI for an updated entrypoint as if it were new.
This is a problem because the old DOI/entrypoint still exists as an "orphan", since no gardens refer to it. Best practices here have some wiggle room, we can either a) reuse the old DOI or b) mint a new DOI with some "versionOf" or "obsoletes" metadata field referencing the original so it isn't orphaned.
Different entrypoints, same DOI (uh oh)
In the current system, I might manually reuse my DOI in an attempt to work around the versioning problem by pinning the DOI. Not only is this clunky UX -- I need to publish the notebook just to get the DOI, then I need to edit the notebook to pin it in my EntrypointMetadata -- but I might accidentally commit a cardinal DOI sin, like so:
I publish an entrypoint to garden A
I pin the DOI "10.23677/my-only-doi" in good faith
I realize my entrypoint would make a good fit for garden B, if I tweak it just a little (say by changing the signature of the function). It does not occur to me to un-pin my DOI
I edit the entrypoint and publish it to garden B
Now garden A and garden B both have an entrypoint under "10.23677/my-only-doi", but that DOI corresponds to a totally different function in garden A vs garden B. Drat.
DOI commandments
Here are two things about DOIs DataCite says we should be able to assume:
Here are some easy ways for users to break these assumptions with entrypoints:
Same entrypoint, different DOIs
Suppose I'm a user who has a) already published some entrypoint to my garden and b) wants to publish the same entrypoint to another garden. To do this, I leave everything in my notebook exactly the same, except I update
@garden_entrypoint(..., garden_doi="10.23677/my-garden-doi")
togarden_doi="10.23677/my-other-garden-doi"
. I naively runnotebook publish
, and a new, unrelated DOI is minted for an identical entrypoint.Updated entrypoint, unrelated DOIs (the "versioning problem")
A related problem crops up if I want to modify my entrypoint but still publish to the same garden -- naively running
notebook publish
after making my changes will get an error telling me that an entrypoint already exists with that name in that garden. So I remove the old version manually from my garden using the CLI. Now I can republish without any errors, but again this mints an unrelated DOI for an updated entrypoint as if it were new.This is a problem because the old DOI/entrypoint still exists as an "orphan", since no gardens refer to it. Best practices here have some wiggle room, we can either a) reuse the old DOI or b) mint a new DOI with some "versionOf" or "obsoletes" metadata field referencing the original so it isn't orphaned.
Different entrypoints, same DOI (uh oh)
In the current system, I might manually reuse my DOI in an attempt to work around the versioning problem by pinning the DOI. Not only is this clunky UX -- I need to publish the notebook just to get the DOI, then I need to edit the notebook to pin it in my
EntrypointMetadata
-- but I might accidentally commit a cardinal DOI sin, like so:"10.23677/my-only-doi"
in good faithNow garden A and garden B both have an entrypoint under
"10.23677/my-only-doi"
, but that DOI corresponds to a totally different function in garden A vs garden B. Drat.