They need 1) a file (up to 50GB) and 2) optionally take a URL when registering for a DOI. We should provide both, the file can be the metadata dump and the URL is the URL to the project page. They also support 3) versioning so we can occasionally update the DOI registered (say, when we upload a bunch of new data for a project or at the end of the quarter when we want to tag an overall DCP release).
if a user wants to refer to the DOI for the project overall, they just use the base DOI e.g. DOI:10.5281/zenodo.596994 which points to all versions
if a user wants to refer to a moment in time for that project then they can use the full DOI which points to a particular version such as DOI:10.5281/zenodo.2602233, this version is associated with a particular metadata TSV dump that was published on Zenodo
@lauraclarke commented on Apr 14, 2019:
It is also worth noting that data under a DOI isn't totally immutable and can change. It is possible to support versions
If I have assigned a DOI name and I make a change to my material, should I assign a new DOI?
The IDF does not have any rules on this. Individual RAs adopt appropriate rules for their community >and application. As a general rule, if the change is substantial and/or it is necessary to identify both >the original and the changed material, assign a new DOI name.
Reading figshare's advice to updating data for submitters is a reasonable guide to the sort of process we might want to adopt once we have a more fully featured system.
@lauraclarke commented on Apr 17, 2019:
I think this highlights the need for a conversation about what type of updates genuinely change the dataset from a reproducibility perspective and what types of edits are more improvements/error corrections and don't affect how reproducible the data referred to by the url is. I really like Crossref's video about their Crossmark service for giving a summary of the second type of edits https://www.crossref.org/services/crossmark/
We also need to think about which registration authority are we going to use? (http://www.doi.org/registration_agencies.html), do we go with a service provided by one of our institutions (EBI is already registered with Crossref), how are they paid for? How rapidly should we give things DOIs balancing the immediate need for people to want to cite publicly accessible data with the fact that in the early stages of submission there may be more frequent updates?
@lauraclarke commented on Apr 23, 2019:
I did a bit of digging and found a DOI assignment process we used for a former project Blueprint which used the EBI service to generate DOIs, It is a very simple process though I suspect there are subtleties
to discuss
Very happy to start conversations with our literature services team about our plans to see if this would be a suitable solution. If DOI assignment is in scope for this quarter it feels better to use a service which is much closer to one of the collaborating institution than an entirely third-party service.
Thinking about this more, it would seem a good idea to discuss this at PM/Tech Arch level and decide if we want to use someone elses service for this at all or if it would be better for the HCA to become an authority who can assign DOIs ourselves
I haven't read the Crossref membership terms in detail but this should be discussed
Collecting together the various discussion as to which DOI issuing authority should be used by the DCP.
Also see "Implementation notes for DOI support" in https://github.com/HumanCellAtlas/dcp-community/blob/master/rfcs/text/0014-data-citation-plan.md
@briandoconnor commented on Apr 17, 2019:
@lauraclarke commented on Apr 14, 2019: It is also worth noting that data under a DOI isn't totally immutable and can change. It is possible to support versions
The DOI FAQ says this https://www.doi.org/faq.html
Reading figshare's advice to updating data for submitters is a reasonable guide to the sort of process we might want to adopt once we have a more fully featured system.
https://knowledge.figshare.com/articles/item/can-i-edit-or-delete-my-research-after-it-has-been-made-public
@lauraclarke commented on Apr 17, 2019: I think this highlights the need for a conversation about what type of updates genuinely change the dataset from a reproducibility perspective and what types of edits are more improvements/error corrections and don't affect how reproducible the data referred to by the url is. I really like Crossref's video about their Crossmark service for giving a summary of the second type of edits https://www.crossref.org/services/crossmark/
We also need to think about which registration authority are we going to use? (http://www.doi.org/registration_agencies.html), do we go with a service provided by one of our institutions (EBI is already registered with Crossref), how are they paid for? How rapidly should we give things DOIs balancing the immediate need for people to want to cite publicly accessible data with the fact that in the early stages of submission there may be more frequent updates?
@lauraclarke commented on Apr 23, 2019: I did a bit of digging and found a DOI assignment process we used for a former project Blueprint which used the EBI service to generate DOIs, It is a very simple process though I suspect there are subtleties to discuss
DOI_instructs.pdf
Very happy to start conversations with our literature services team about our plans to see if this would be a suitable solution. If DOI assignment is in scope for this quarter it feels better to use a service which is much closer to one of the collaborating institution than an entirely third-party service.
Thinking about this more, it would seem a good idea to discuss this at PM/Tech Arch level and decide if we want to use someone elses service for this at all or if it would be better for the HCA to become an authority who can assign DOIs ourselves
I haven't read the Crossref membership terms in detail but this should be discussed
https://www.crossref.org/membership/
Doc from @gabsie: https://docs.google.com/document/d/1eM80EGe3T4VTU5hyBUKCGdN17k_MMxsRcWwHBB-54n4/edit#heading=h.wgzkwbvrtz50