OpenScienceMOOC / Module-5-Open-Research-Software-and-Open-Source

Module 5: Open Research Software and Open Source
https://eliademy.com/catalog/oer/module-5-open-research-software-and-open-source.html
MIT License
74 stars 52 forks source link

Repo metadata #42

Open mrchristian opened 6 years ago

mrchristian commented 6 years ago

The issue has been raised by @danielskatz on Twitter https://twitter.com/danielskatz/status/1036992161508667392 about the need to 'declare the metadata for the repository'.

I will review our current coverage of this issue and look how to proceed.

I will document the issue in full below.

mrchristian commented 6 years ago

The current position for recording metadata of the repository has been for a 'lite' approach. This is mainly informed by trying to keep the amount of ground covered in the instructions to a minimum.

Here is what is currently described for recording metadata:

https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source/blob/master/content_development/Task_2.md#getting-a-doi-

To summarise the process:

  1. Zenodo captures author names from the GitHub repository
  2. Admins of the Zenodo can edit metadata
  3. Zenodo generates version numbers
  4. Zenodo assigns DOIs
  5. Zenodo has a variety of metadata fields that can be filled in
Protohedgehog commented 6 years ago

Brilliant, thanks @mrchristian. Is there a way we can use the communities function of Zenodo to make things a little easier here? I'm not sure exactly what sort of things this allows just yet https://zenodo.org/communities/open-science-mooc/?page=1&size=20

mrchristian commented 6 years ago

Communities - i think it just functions as a collection of some sorts. I'll work back through Zonodo's metadata editing and generation process, I have a bunch of repos on Zenodo I can try this out on. Then I'll have a think about how to best wade through the swamp :-)

mrchristian commented 6 years ago

Looking at Zenodo's metadata representation of a deposit it would seem to use that as the editing site of the metadata and then put a file back into the GitHub repository as some point if needed as whatever format is preferred, BibLatex, etc.

You can also see the fields listed here http://developers.zenodo.org/#depositions

The owner of a Zenodo deposit can edit the metadata via the web interface, not sure if there is group access.

The reason I suggest using Zenodo as the key location for maintaining metadata is that Zenodo will do the job of distributing the metadata.

As an idea for later it would be nice to get use the Zenodo API to write the metadata back to your repo in whatever flavor of markup preferred.

I'll give it a spin on a dummy repo

Protohedgehog commented 6 years ago

OK, awesome, thanks @mrchristian! Will be interesting to see how this can ultimately feed back in either to how we index the MOOC content, or as part of the learning content.

mrchristian commented 6 years ago

Back on the case now, will get this sorted this week. First is to consult @Zenodo support and get a usable representation of their metadata schema, then consult the #softwarecitation community about the dilemma of which route to take: Zenodo output, CFF, CodeMeta, BibTex.

Its so annoying that these things are not clear and worked out already. If only all that money being wasted on research service companies profits was actually used to fix basic plumbing problems in academia, Jees :-) The prisoner emerges from the cave.

tosteiner commented 6 years ago

@mrchristian not sure if this helps, but Chris Gorgolewski has written a neat run-down on how this might work automatically:

and I guess sticking to a minimal content scheme for author names of

{
      "name": "Rabbit, Roger",
      "orcid": "0000-0002-468-1234"
}

would easily suffice, don't you think?

(I think we've had the same issue over at https://github.com/Open-Scholarship-Strategy/site/issues/30 hence I'm just copying it here 😉 - sadly, my personal skills at proper metadata coding are rather limited, it was rather a copy&paste try 'n' error thing :) )

mrchristian commented 6 years ago

Hey, thank you, brilliant. Do you think this approach enables the contributor information to get incorporated into the Zenodo and DataCite records for the repository?

That's one of the goals I'm trying to achieve as thats the information others are harvesting.

Thanks again :-)

tosteiner commented 6 years ago

As far as I understood it, it adds the possibility to push author info to the Zenodo repo, so yes, it's incorporated with Zenodo... and DataCite then picks that up and uses it for its own purposes :)

mrchristian commented 6 years ago

AOK, the super.

Zenodo outputs the 'deposit' metadata in a variety of formats so others can use it.

I can see on the example repo they have extensive metadata, I'll try out the process on a test repo, or on Zenodo's sandbox and see if the creator names get picked up into the system.

https://zenodo.org/record/581704/export/dcite4

mrchristian commented 6 years ago

Hi,

Glacially slow reply, must be on some low frequency packet radio system.

But I'm finally back on it and I've got it cracked. Well at least whats going on. More to do to really sort out the full situation, a bit out of my scope, but at least I can now recommend a better solution than we started with.

So, whats the 'craic' as they say.

Zenodo picks up a file called .zenodo.json to read metadata. Of course no one makes this clear, instead its hidden in tab, deep in the Zenodo repository area.

JSON Export Zenodo automatically extracts metadata about your repository from GitHub APIs. For example, the authors are determined from the repository's contributor statistics. The automatic extraction is solely a best guess. Add a .zenodo.json file the root of your repository to explicit define the metadata. The format of file is the same as for our REST API (use e.g. below JSON to get started).

The results of doing this is what @tosteiner pointed me too, thank you. But I then needed to understand whats going on.

I did a test in Zenodo's sandbox site.

https://sandbox.zenodo.org/record/246036

from repo

https://github.com/hybrid-publishing-group/book-coding/tree/master

You can actually write lots of the metadata here, see example, but not things like any UIDs.

https://github.com/hybrid-publishing-group/book-coding/blob/master/.zenodo.json

This is more like what we would need, just names, although even in this case there can be 'contributors' and 'creators', also with types, 'editor', 'researcher'. etc.

Soooooo.... In a nutshell my recommendation is as follows.

A key objective is to get rich person metadata into the DOI information ecology and in the repository.

So using the .zenodo.json file is a vast improvement over the GitHub user name.

NEXT

I need to refine the process, workflow and give exact instructions, with an example, and find out from Zenodo and their API documentation and support the extent of what person fields can be added. http://developers.zenodo.org/#metadata-formats

Consult with Zenodo support, software citation community. As I have heard that CodeMeta files can also be read, maybe others can too, like BibTeX?

My aim would be a write up for tomorrow, then consult and then wrap it up. I'll also write a blog post on this as it needs more profile as currently I couldnt find any documentation on the process.

Cheers

Simon

mfenner commented 6 years ago

Adding support for codemeta is on the Zenodo roadmap and should make this much easier.

danielskatz commented 6 years ago

I don't know if the CodeMeta part is working yet, but it certainly will be. Caltech Data can do this now, and they use the same underlying software as Zenodo. see https://twitter.com/CaltechData/status/972163704585269248

mrchristian commented 6 years ago

Thanks for CodeMeta pointers. The CalTech example also helps make the picture clearer as well, its just a choice of what file the Zenodo instance is instructed to pick up, in CalTech's case like so https://github.com/caltechlibrary/dataset/blob/master/codemeta.json

danielskatz commented 6 years ago

Caltech, please :)

tosteiner commented 5 years ago

@mrchristian sorry for nagging on about this... any news on the creation and layout for a OSMOOC-specific .zenodo.json? Or can we adapt the one you mentioned earlier, from the sandbox example?

I guess starting with the built-in option would be great to get things going, and then evolve from that to future implementations such as the CalTech / codemeta.json - would that make sense?