Open paulmillar opened 5 days ago
@paulmillar thanks for opening the issue.
Given that PublishedData
can contains one or more datasets, what would you do if multiple datasets with different techniques are present?
Would you add a list of techniques to publishedData
and than propagate all of them to DataCite?
Hi @nitrosx,
Yes, this is certainly a valid question. I've spent a little time thinking about this, but haven't come to a strong opinion.
One could argue that each technique (of those techniques describing the publishedData
) indicates that there's at least some data (within the publishedData
data) taken with that technique. Under that interpretation the publishedData
techniques would be the union of all techniques in its member datasets.
Alternatively, one could argue the publishedData
techniques should describe all the datasets being published, since the publishedData
is describing all those datasets. With this interpretation, the publishedData
techniques is the intersection of all techniques in the member datasets.
Yet a third option is the selection is context-driven. Why is a DOI being generated? This might suggest some techniques (from the union) be included and other should be ignored. This would be a more nuanced approach, something that would likely require human input.
In practical terms, I would suggest taking the first option (use the union of techniques from member datasets) as an initial version.
A subsequent update could be to present the list of techniques in the web UI, to allow the user to choose/veto techniques, as appropriate.
Summary
The DataCite metadata standard is able to record the experimental technique used to establish the dataset. However, SciCat doesn't do this: so the DataCite metadata is lacking this information.
Note that, although SciCat can store the experimental technique information as dataset metadata, this information is not propagated to publishedDataset.
Steps to Reproduce
Current Behaviour
The DataCite metadata contains no
subject
elements.Expected Behaviour
The DataCite metadata should contain
subject
element(s) that describe the techniques.Details
The document ETN-1: Embedding PaNET in DataCite metadata describes how to include PaNET terms within the metadata associated with a DOI.
The document ETN-2: Working with PaNET terms in SciCat describes how to format PaNET terms within SciCat.
Note that (as described in #1192) the DataCite metadata is calculated in two places:
scicat-backend-next
's published-data.controller.ts andoai-provider-service
's openaire-mapper.ts.Arguably, there should be a single place (within SciCat code) that provides DataCite metadata (as described in #1192). While removing this duplicate code (i.e., closing #1192) would benefit this issue. I don't consider #1192 to block this issue.