IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
879 stars 492 forks source link

Suggestion: Update the CodeMeta Metadata Block to add some more structure for machine actionability #10859

Open doigl opened 1 month ago

doigl commented 1 month ago

Overview of the Suggestion Actually, the fields MemoryRequirements and ProcessorRequirements and StorageRequirements are just free text fields, what makes it difficult to use them in an automated process to provide the right resources for running a jupyter notebook or a container. Adding subfields to these fields with controlled vocabularies would it make it easier to differentiate between different types and identify the right amount of resources like memory.

Also, as @poikilotherm mentioned, the CodeMeta Scheme is now available in version 3 and it could be worth a look, if we want to also add some of the new fields (code reviews) in the metadata block.

What kind of user is the suggestion intended for? (Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin) User, Sysadmin

What inspired this idea? Two different things:

What existing behavior do you want changed? Adding structured subfields and controlled vocabularies at least for the fields memoryRequirements, processorRequirements and storageRequirements. Make the memoryRequirements field multiple to allow different types of memory. We are open to discuss changes also for other fields and think about adding new version 3 fields to the block (do we need software reviews?).

Any brand new behavior do you want to add to Dataverse? Also interesting would be a CodeMeta-Export that then puts the structured fields again together to be compatible with the CodeMeta standard. And we would have to adjust our GitHub-Action to import the information from codemeta files in Git-Repos into Dataverse datasets.

Any open or closed issues related to this suggestion?

Are you thinking about creating a pull request for this issue? Help is always welcome, is this idea something you or your organization plan to implement? We would be happy to provide a suggestion for an updated tsv of the codemeta block, but would also be very interested in the opinion and the requirements of the community, and perhaps especially from @jggautier and @pdurbin

pdurbin commented 1 month ago

We would be happy to provide a suggestion for an updated tsv of the codemeta block

If you're willing to produce an updated tsv, I'd be happy to look at it!

On a related note, as of Dataverse 6.4, you'll be able to designate the "type" of a dataset as software. Please see:

pdurbin commented 2 weeks ago

These's a task under https://github.com/IQSS/dataverse-pm/issues/174 to support CodeMeta and I just added a subtask to look at this issue and consider upgrading to v3 of CodeMeta first. Pull requests welcome, of course! 😄 ❤️

pdurbin commented 1 week ago

@doigl @poikilotherm and others, as I work on this issue...

... I'm wondering if I should promote codemeta.tsv as it exists now, in 6.4, in tests and explanations of the feature or if I should use computational_workflow.tsv which as far as I know, doesn't have any planned updates.

Basically, I'll pick one or the other to explain the feature of associating a dataset type such as "software" with a metadata block such as CodeMeta or Computational Workflow.

I'm a little nervous about promoting CodeMeta much in its current form, since it sounds like it's likely to change. So maybe I'll go with Computational Workflow. 🤷

doigl commented 1 week ago

@pdurbin: sorry for the late answer and the missing pull request so far (too much other things on the plate). Wouldn't be Computational Workflow a good metadata block for workflows and CodeMeta a good one for software? But I have to admit, that I do not really have a clear understanding about the difference between the two types workflow and software.

The main changes in version 3 are - as far as I know - the review/reviewBody/reviewAspect fields, a start and end date, the hasSourceCode/isSourceCodeOf relations and the renaming of continousIntegration and embargoEndDate.

While it would be really great to have the possibility to link to external reviews for software and for data (perhaps in form of badges), I would not see this feature in the software metadata/codemeta block, because this is important for datasets (and workflows?) as well.

The relations between source code and application could be implemented in the "Related Materials" in Citation, if we would have there also the relation types.

And so far, we do not have a use case for a start and end date for software.

What do you mean @pdurbin , @poikilotherm ?

pdurbin commented 6 days ago

@doigl thanks, yes, CodeMeta for software makes sense, of course. I'm playing with the codeMeta20 block right now. One thing I observe about both codeMeta20 and computationalworkflow is that both have fields with displayoncreate set to TRUE, which other metadatablocks don't have.