Open richard-jones opened 2 months ago
@LalithaKambhammettu @npapantonis will identify the stakeholders to be involved in this.
@richard-jones to write a short summary of what we need to achieve by the end
Wayne Peters w.peters@imperial.ac.uk to potentially lead the data modelling conversation due to vast experience in supporting this operationally. There has been some discussion around a "generic" baseline set / schema for rich metadata and also we need to keep abreast topical extensions, and how we accommodate these (e.g. Chemistry) as some topics may require different entities and attributes or extended sets. Do we 'call in' separate schemas? How do we map and represent this on the front end? Would it be an initial selection process to determine what is being deposited and then provide the relevant set(s)? There might be an internal requirement for us to reach out to departments in order to understand the variants of topics but we should initially focus on a generic rich baseline.
Attendees: Wayne Peters w.peters@imperial.ac.uk, Nicholas E M Wood nicholas.wood@imperial.ac.uk, Ian J McArdle i.mcardle@imperial.ac.uk; David J Colling, <d.colling@imperial.ac.uk, Lalitha K S Kambhammettu l.kambhammettu@imperial.ac.uk; Trevor C Newbury t.newbury@imperial.ac.uk, Christopher I Cave-Ayland c.cave-ayland@imperial.ac.uk & Noel Papantonis n.papantonis@imperial.ac.uk
These are the points I would want to address during the workshop:
My notes from today:
@Steven-Eardley and @richard-jones to have a look at the md profile and review if there are any special requirements for implementation
@npapantonis to check with Wayne about using the shared document: https://imperiallondon.sharepoint.com/:x:/s/Project/FAIR%20Data%20Working%20Group%20Initiative/EUBqFB0mXBRIo6_EFsOvJ-MBpj3ALBv4_Xil9K8qPJASQg?e=lbhSR0
Speculative date of 23rd September for on-site
I have done some slightly deeper analysis of the metadata profile, and the mappings across datacite and inveniordm:
A couple of questions/points to raise:
Hi Richard. My responses:
2 The main information for authors will include their name, name identifier (ORCiD), and affiliation (ROR). DataCite and InvenioRDM include Name type (i.e. to differentiate between person or organisation), but not sure we need this on the submission form. If we don't capture "first name" and "family names" will this affect mapping to DataCite?
3 Main identifiers will be DOI, ORCiD and ROR (will we also assign unique system IDs to records for internal use?). Additional identifiers may be needed for Alternate identifiers/Related resources - the InvenioRDM Metadata Reference provides a list of supported identifier schemes.
Also - just to make things more complicated - we would like allow users to publish metadata records for externally hosted datasets, Ideally, there should be an additional field called “Existing DOI.” When populated, would prevent a new DOI being created for the record. When populated, this field would prevent a new DOI from being created for the record. However, not all repositories use DOIs, so we might also need to consider how to enable non-DOI identifiers, such as accession numbers, to be added to the repository record.
4 We don't have specific use cases for it. I'm not sure we need to support full capabilities in the first release, but I might need to raise this with other members of the breakout group. We could just have a free text box that allows depositors to enter geographic region(s) or named place associated with the dataset. This would map to DataCite 18.3 geoLocationPlace and presumably 'place' in the InvenioRDM metadata reference model. Many institutional data repositories (let's call them IDRs) do include an additional field for coordinate values, but not all do.
6 Depositors should have the option to select access level on the submission form (most likely "Open", "Embargoed", "Restricted"). But this would not be displayed on the public metadata record (not sure if this answers your question!). Although obviously the public record will need to inform end users that access is restricted (see comments for 10. below).
8 Some IDRs make depositor information public, but most do not. Therefore, it probably doesn’t need to be on the public metadata record. This information is mainly for cases where the depositor is not one of the listed data creators. In some IDRs, the depositor field is automatically populated from the login details. Is this something we could implement?
9 Again, some IDRs do include contact information on the public record but most don't. The alternative, for restricted datasets, is to publish an admin email address to the public record (see below).
10 One repository added a text prompt so that if ‘restricted’ access is selected, depositors are asked to add details in the description field. Is this something we could do? I haven’t found any examples of submission forms with a separate field for data access statements in the IDRs I reviewed (and I reviewed many). I suspect this is because most of them only accept open access or embargoed datasets. I did find a few repositories that provide managed access to sensitive data. Rather than asking depositors to provide details of access restrictions, if “restricted” is selected, pre-defined text is added to the public record informing the end user that access is restricted, with details of who to contact to request access (e.g., the library/repository admin staff). Is this something we could implement? Here are some examples:
https://researchdata.uwe.ac.uk/id/eprint/703/ https://data.bris.ac.uk/data/dataset/1cq4ulhrjdmpf240uhjb2o6jov https://researchdata.bath.ac.uk/1328/
If we don't capture "first name" and "family names" will this affect mapping to DataCite?
No, these fields are optional in DataCite. My recommendation against breaking the name down in this way is because it's slightly artificial for a lot of names, and it's easier not to open that door, just let people tell us their name as they see it.
DataCite does recommend a "given" followed by "first" ordering even in the general name field, but this is advisory only, I believe.
Also - just to make things more complicated - we would like allow users to publish metadata records for externally hosted datasets, Ideally, there should be an additional field called “Existing DOI.” When populated, would prevent a new DOI being created for the record. When populated, this field would prevent a new DOI from being created for the record. However, not all repositories use DOIs, so we might also need to consider how to enable non-DOI identifiers, such as accession numbers, to be added to the repository record.
I believe this is possible using the external
type on the pid
record, but I will review with @Steven-Eardley and @J4bbi
Some IDRs make depositor information public, but most do not. Therefore, it probably doesn’t need to be on the public metadata record. This information is mainly for cases where the depositor is not one of the listed data creators. In some IDRs, the depositor field is automatically populated from the login details. Is this something we could implement?
Automatically generating makes sense, I will add it to the implementation requirements
One repository added a text prompt so that if ‘restricted’ access is selected, depositors are asked to add details in the description field. Is this something we could do?
Yes, that's a good idea. After discussion with the team here, we think that if we're going to capture this, then custom field for this information is best, so we can ask users to populate that field and then use that to display the information in the relevant places. Otherwise, we can automatically populate the custom field with some default text if the access restrictions are set.
I've added implementation notes to our spreadsheet here https://docs.google.com/spreadsheets/d/106fzB6EiVJmnd3kRmTc1HEkx2h8zTR2S8PFlr-_qMHY/edit?gid=1013395959#gid=1013395959
Acceptance Criteria
List the criteria which must be met for the issue to be considered complete