dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
13 stars 13 forks source link

Stop injecting "fake" DOIs into draft dandisets #1709

Open mvandenburgh opened 1 year ago

mvandenburgh commented 1 year ago

Our current publish/DOI creation workflow needs improvement. Currently, when publishing a dandiset, we inject a "fake" DOI into the dandiset metadata in order to allow it to be validated against the PublishedDandiset schema; this is because we cannot validate a dandiset without putting a valid DOI in its metadata, but we don't want to hit the DOI server for a real DOI without first validating our dandiset.

Instead of using a "fake" DOI , we can create a "draft DOI" for all draft dandisets. Then, we can promote those to a "Findable" DOI when a publish actually happens.

waxlamp commented 10 months ago

There's a confluence here with #1710. Perhaps they can be folded into a single issue.

waxlamp commented 10 months ago

The other idea here is to split the concepts up, so that our current validation process is meant as a clearinghouse for publishing, and a separate validation process would be for draft dandisets specifically. The latter might be similar to the former, but for certain values being optional, etc. (including, crucially, DOIs).

satra commented 10 months ago

Instead of using a "fake" DOI , we can create a "draft DOI" for all draft dandisets. Then, we can promote those to a "Findable" DOI when a publish actually happens.

this seems very reasonable to me and would also help users who want to insert a doi in their publication for review without it being set in stone if reviewers want changes to the dandiset. and yes it aligns with thoughts in #1710. we can also garbage collect drafts as needed if we see a dandiset abandoned.

CodyCBakerPhD commented 10 months ago

this seems very reasonable to me and would also help users who want to insert a doi in their publication for review without it being set in stone if reviewers want changes to the dandiset.

+1 to 'draft DOI' idea; we get this question/request from users very frequently

yarikoptic commented 10 months ago

I think we could do even better. Similarly to how Zenodo does it, we can have a "dataset DOI" which would first point to draft and then most recent released version. For that reason we do not even need to make it mutable since our DLP shows most released one IIRC. But the only concern is -- metadata which would be absent upon creation and then improved. So we could create DOI with minimal/fake metadata and then keep updating it upon every metadata editing. I bet there is type of mutable DOI we could use here, couldn't we?

rly commented 4 months ago

Just pinging this as I found a fake DOI in the wild in https://arxiv.org/pdf/2406.19492

29. Mazzamuto, G. et al. Human brain cell census for ba 44/45 (version draft). DANDI archive https://doi.org/10.80507/dandi.
123456/0.123456.1234 (2004).

Here are a few more: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C21&q=https%3A%2F%2Fdoi.org%2F10.80507%2Fdandi.+123456%2F0.123456.1234&btnG=

yarikoptic commented 4 months ago

I strongly believe we should address it via

description of which I just updated with more detail.

djarecka commented 4 months ago

so do you think of creating a new class PublishedDraftDandiset with schema that has the same field as PublishedDandiset, but more fields that are optional?

yarikoptic commented 4 months ago

I didn't look into how it could/should be implemented but I wouldn't call "Draft" to be "Published" somehow. It is more of a point that even Draft one should acquire ability to have a valid DOI if it doesn't have one yet.