Closed sync-by-unito[bot] closed 9 months ago
This issue represents a deliverable funded by the NIH This deliverable supports the NIH Initiative to Improve Access to NIH-funded Data
Aim 2: Increase support for biomedical and cross-domain metadata standards and controlled vocabularies
One of the useful characteristics of the Dataverse open-source software is its extensive support for metadata standards and additional custom metadata.
The standards currently supported include:
In particular, DDI makes a Dataverse repository interoperable even at the variable/attribute level since it supports variable descriptive and statistical metadata. This allows data exploration and analysis tools to integrate easily with the repository and discovery engines to find variable information.
In this project, we propose to
Links:
Related Deliverables: 2 | 1.2.2 | Define use cases for DDI-CDI support | 5 2 | 2.2.1 | Design and implement support for DDI-CDI 2 | 2.2.2 | Define use cases for supporting biomedical metadata standards 2 | 3.2.1 | Design and implement biomedical metadata standards, and add funding related metadata 2 | 4.2.1 | Assess and improve metadata support
who:
September Update: (1.2.1) A spike for discovery (Dataverse GitHub Issue IQSS/dataverse#8681) is in progress to determine what changes need to be made to support biomedical vocabularies, including the UMLS, CEDAR, and MeSH vocabularies. Progress on this Aim continues to be stalled while the team focuses on the re-architecture project (1.7.1).
October Update: (1.2.1) We are moving past the intial spike. Progress on this Aim continues to be stalled while the team focuses on the re-architecture project (1.7.1).
Updating Description - Replacing this text The text below reflected our earliest understanding of this deliverable. It is wrong.
The deliverable is Code and documentation for controlled vocabulary support
1) Research existing implementations of controlled vocabularies, 2) Design and implement code to extend metadata fields to use controlled vocabularies, 3) Test and document controlled vocabularies.
Three in particular have been discussed.
Initial Understanding
For the first year, we believe that the work was to implement controlled vocabularies.
Dataverse has support for controlled vocabularies. They can be locally stored or can be dynamically pulled via an API. This was implemented by the community.
We believe this initial work has been done by the community. So we may be able to argue that the first step to this which is the intial implementation of controlled vocabularies is completed.
Links:
Related Deliverables: 2 | 1.2.2 | Define use cases for DDI-CDI support | 5 2 | 2.2.1 | Design and implement support for DDI-CDI 2 | 2.2.2 | Define use cases for supporting biomedical metadata standards 2 | 3.2.1 | Design and implement biomedical metadata standards, and add funding related metadata 2 | 4.2.1 | Assess and improve metadata support
Cleaned things up. Moved the day-to-day notes to the sidecar issue.
The whitepaper has been completed.
Last updated: Mon Dec 5 2022
(1.2.1) Individual GitHub Issues for the various tasks to support useful controlled vocabularies for the NIH GREI program have been created. A whitepaper describing the steps needed to be taken to support a particular external controlled vocabulary has been completed. A general proof of concept has also been completed of the steps described in the white paper. The next tasks are to apply these steps to support Fundref and ROR
81%
Last updated: Thu Dec 15 2022 before I left for the holiday Report: Dec 2022
There is a completed whitepaper describing the steps needed to be taken to support a particular external controlled vocabulary. Issues for support of FundRef and ROR are queued.
81%
priority discussion with Stefano: Left:
Monthly report
(1.2.1) Our initial discovery work led to a slightly expanded scope and we are continuing to work on issues for the support of FundRef and ROR in the current sprint.
84%
Feb report (1.2.1) We reviewed and confirmed a list of metadata fields that the GREI metadata and search WG recommends and we commented on the vocabularies that Harvard Dataverse uses, thinks should be used, or has plans to use for particular fields, including using CrossRef's registry of research funders (FundRef) and ROR for organization names. We are continuing to work on issues for the support of FundRef (see GitHub issue) and ROR (see GitHub issue) in the current sprint
XX%
March Report
(1.2.1) This activity was completed at an extent of 90% in year 1 and transferred to year 2.
Draft year one summary: FY1 Annual Summary
This activity was completed at an extent of 90% in year 1. The team did an inventory of existing controlled vocabulary functionality and researched changes needed to support biomedical vocabularies. This includes a proof-of-concept for supporting Fundref and ROR and a whitepaper on how to use the existing framework for supporting other controlled vocabularies such as UMLS, CEDAR, and MeSH. Additionally, the GREI metadata and search working group confirmed a list of recommended metadata fields, which includes using FundRef and ROR for organization names. Year 2 work toward completion will be tracked as yr:2 aim:2 task:1a (2.2.1A) starting at 90% complete.
90% complete
2024/01/03
References:
Problem Statement
The marker for what's possible has moved in the time since the proposal was submitted and granted. At the time, there was no support for externally controlled vocabularies in the wild. Since then community effort has created a javascript based approach to the problem. Solutions for Orkid and FundRef exist. There is also support for creating a solution for ROR.
Proposed Solution
We can add support for a less complex controlled vocabulary that is needed by the community as a path to mastering the problem space and the new javascript based approach to the solution
Acceptance Criteria
Links:
┆Issue is synchronized with this Smartsheet row by Unito