galterlibrary / InvenioRDM-at-NU

Next generation repository for health science
MIT License
9 stars 0 forks source link

Metadata Schema Reference sheet #272

Open fenekku opened 5 years ago

fenekku commented 5 years ago

[EDIT] For metadata reference, see now:

Core metadata model: https://github.com/inveniosoftware/invenio-rdm-records/issues/1 Our extension: https://github.com/inveniosoftware/invenio-rdm-records/issues/2

--

This is a reference sheet for what metadata the record must store or be able to derive:

Field or equivalent Notes Why Task Implemented
Title Required input Citation, DOI minting #10 :heavy_check_mark:
Authors (Creator) Required input: first name (req),middle name, last name (req). For unknown authors: enter Unknown in required fields Citation, DOI minting #10 :heavy_check_mark:
Description Required free text input To describe record -- :heavy_check_mark:
Resource Type Required input #10 :heavy_check_mark:
DOI Generated by DataCite want persistent, unique, embedded in larger id collection, identifier #10 :heavy_check_mark:
Contributor Can be placed in imagined People section as a role Attribution, Optional DOI entry TODO :x:
Date Created Autogenerated -- :heavy_check_mark:
Grants and Funding For impact assessment, funding sources TODO :x:
Keywords Subjects are from controlled lists. Keywords is from crowdsourcing
Language Language of content. Optional input General metadata TODO :x:
Location Location of presentation (location used in citation, this should be the one) :x:
Original Bibliographic Citation Can this be: "Would like to be cited as" ? :x:
Original Identifier Pre-exisiting DOI if any; merge with DOI :x:
Page Number Optional. Only for Book, Text Resources and Articles :x:
Private Note SUPER_USER, librarian, owner, proxy can see it :x:
Publisher Auto-generate menRva, but allow override. Multiple publishers should be allowed but to be seen how realistic this is :x:
Publication Year Auto-generate
Related URL Optional input :x:
License (Rights) Required input :heavy_check_mark:
Subject: Geographic Name Location of subject matter - Feed from MeSH :x:
Subject: MeSH, Subject: LCSH Optional input :heavy_check_mark:
Subject: Name Optional. Name of person/organization referred in content (e.g. book about someone) :x:
Visibility Who can access the record: Public, Restricted, Private (+shared with) missing shared with
Acknowledgements Attribution :x:
Abstract Optional. Actual abstract of document if any :x:

Related links:

saragon02 commented 5 years ago

Working menRva metadata schema, based on DigitalHub and removing ARK:

Abstract Acknowledgements Contributor Creator Date Created Description DOI Grants and Funding Keywords Language Location Original Bibliographic Citation Original Identifier Page Number Private Note Publisher Related URL Resource Type Rights Subject: Geographic Name Subject: MeSH, Subject: LCSH Subject: Name Title Visibility   Extra fields available for resource type groups: Dataset, Articles, Study Documentation, Theses & Dissertations and Text Resources Data Access Data Collection Method Tools & Measures Study Type Research Design Sample Size Subject of Study Population Gender Population Age

saragon02 commented 5 years ago

Some questions from the table may need to be answered based on further metadata subcommittee meetings. My recommendations to date are below:

Contributor: Great idea for a People section, and to have Contributor as a role Keywords: Can ‘live’ near the Subject section, but these are the user-generated keywords that might eventually be saved into menRva’s own crowdsourced dictionary Location: This is location of publisher or where the work first went public (e.g., city where a conference presentation was first given). Can be auto-generated if that information is easily harvestable Page Number: Make this field available for resource type categories: Book, Text Resources and Articles Private Note: Only visible for the record owner Publisher: Allow multiple to display, but menRva always displays first Subject Geographic Name: Different from Location, more about a distinct space as the subject matter of the thing deposited. Fed from: http://id.loc.gov/authorities/subjects/sh85089606.html Subject Name: Refers to a person being the subject matter of a deposit (e.g., a book written about someone). Fed from: http://id.loc.gov/authorities/names.html Format of display can be in LOC Name Authority format, Last Name, First Name, Middle Name or Initial Abstract: Different from Description and still needed. People have used them interchangeably in DH, but one use case is for archival items, where a description of the physical thing is often entered in Description, and other descriptive information is entered in Abstract. Some also use Abstract as it would be used in a scholarly publication. Original Bibliographic Citation: Will follow up with format preference.

fenekku commented 5 years ago

Raw from notes by @LisaOKeefe1 about discussion on metadata with @lnielsen :

more editing needed Galter’s COAR-based resource types were generated hierarchically and this hierarchy is mapped and indexed on the back-end. If this is complicated for indexing, we can take it out. Contributor - controlled list of contributor types should be customizable Zenodo’s pulls from Datacite Galter’s could pull from the CRO In Galter’s UI we are thinking of a “people” group of fields, which can be tagged with roles. Only those tagged with “Author” populate the citation Kristi shared CRO on the Ontology Lookup Service website Lars wondering if roles are done. Yes, just getting input. KH - we can take this, sit down with Martin, map their contributor roles to ours. Lars - to add it we need to determine if it is a new thing to add. If so, they add it. Grants funding Optimization to make it easier. Registry of grant numbers Open-AIRE has the biggest database, some US funders. Crossref has open funder registry. Kristi - we can get the federal stuff. There’s an API to NSF, DARPA, a bunch of them. Sara - we wonder if we need three fields, grant name, number and link (provided in the DataCite XML schema) Lars - problem comes down to grant number if funder doesn’t have grant identifier or if it’s not persistent. In EU there’s acronym, too. Can leverage FundRef for the direct link to grant pages. Would run into problems for any grants that don’t have a webpage Keywords: free text Language Will we support multiple? Internationalization support when? Recommend asking user to select a primary language of the deposit/record Location of presentation About where original material was presented Datacite doesn’t have fields for this. In Galter’s instance Location would need to be a custom field. See if others have the same requirements Also, see: exhibit. Where does it take place? Different from GeoLocation, which refers to a place as part of the subject matter of the deposit/record Original identifier - can’t accept those Page number - will be updated by Galter to something like number in sequence. Publisher - required by DataCite. Will auto-generate “Invenio RDM” Can supply others as in DH. We’re thinking of restricting to only one Publication - publication date v submission date. Submission is auto-generated and refers to publication of the record. Publication can be a supplied publication date, can be much earlier Related URL - we’re going to end up with a collection of 3 fields to describe this. We will most likely customize choices to our needs For relationType Galter wants to keep: Related Cites Part an Alternate Identifier of Is Related To (catchall when no other relationType applies) License: Could really use a wizard or some sort of definitions/guidance for users as in CC-Australia Subjects (GeoLocation, MeSH, LCSH, etc.) Show all the different types in a Subjects group, leverage and make explicit the controlled vocabs we’re using Acknowledgements - acknowledge some on campus institute This is a type of “Description” - application of the “Additional Notes” to denote the field for Acknowledgements/ Could hold a text blurb. Otherwise, if it’s about acknowledging ppl who contributed in different roles to the deposit, wouldn’t Contributors with a fully deployed CRO cover this? Almost, but Kristi mentioned there will always be some who need attribution who can’t be linked to any other way. “Parking lot” for now. Could leverage JATS and ICJME in the future Metadata: Zenodo References: Free text box Journal fields (for recording journal information for citations. Comes from an earlier version of Zenodo)

saragon02 commented 5 years ago

LinkedDataResources_MeSH_FAST_Others.pdf If it is helpful, here is a PDF with links to linked data resources for various controlled vocabularies.