IQSS / dataverse-pm

Project management issue tracker for the Dataverse Project. Note: Related links and documents may not be public.
https://dataverse.org
0 stars 0 forks source link

2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 | #7

Closed sync-by-unito[bot] closed 9 months ago

sync-by-unito[bot] commented 2 years ago

References:

Problem Statement

The marker for what's possible has moved in the time since the proposal was submitted and granted. At the time, there was no support for externally controlled vocabularies in the wild. Since then community effort has created a javascript based approach to the problem. Solutions for Orkid and FundRef exist. There is also support for creating a solution for ROR.

Proposed Solution

We can add support for a less complex controlled vocabulary that is needed by the community as a path to mastering the problem space and the new javascript based approach to the solution

Acceptance Criteria

Links:

┆Issue is synchronized with this Smartsheet row by Unito

mreekie commented 2 years ago

This issue represents a deliverable funded by the NIH This deliverable supports the NIH Initiative to Improve Access to NIH-funded Data

Aim 2: Increase support for biomedical and cross-domain metadata standards and controlled vocabularies

One of the useful characteristics of the Dataverse open-source software is its extensive support for metadata standards and additional custom metadata.

The standards currently supported include:

In particular, DDI makes a Dataverse repository interoperable even at the variable/attribute level since it supports variable descriptive and statistical metadata. This allows data exploration and analysis tools to integrate easily with the repository and discovery engines to find variable information.

In this project, we propose to

  1. expand DDI support to include the recently released DDI-Cross-Domain Integration (DDI-CDI) schema
  2. build on existing support for biomedical-related standards relevant to NIH-funded research cases, following the recommendations from https://fairsharing.org/
  3. expand descriptive and citation metadata to support funding information and related fields, and
  4. integrate with external services to enable the support of controlled vocabularies for any metadata field, based on standardized, widely used data dictionaries. The HMS Research Data Management group will participate in the development of these standards and vocabularies for biomedical datasets, working directly with research laboratories.

Links:

Related Deliverables: 2 | 1.2.2 | Define use cases for DDI-CDI support | 5 2 | 2.2.1 | Design and implement support for DDI-CDI 2 | 2.2.2 | Define use cases for supporting biomedical metadata standards 2 | 3.2.1 | Design and implement biomedical metadata standards, and add funding related metadata 2 | 4.2.1 | Assess and improve metadata support

mreekie commented 2 years ago

who:

mreekie commented 1 year ago

September Update: (1.2.1) A spike for discovery (Dataverse GitHub Issue IQSS/dataverse#8681) is in progress to determine what changes need to be made to support biomedical vocabularies, including the UMLS, CEDAR, and MeSH vocabularies. Progress on this Aim continues to be stalled while the team focuses on the re-architecture project (1.7.1).

October Update: (1.2.1) We are moving past the intial spike. Progress on this Aim continues to be stalled while the team focuses on the re-architecture project (1.7.1).

mreekie commented 1 year ago

Updating Description - Replacing this text The text below reflected our earliest understanding of this deliverable. It is wrong.


The deliverable is Code and documentation for controlled vocabulary support

1) Research existing implementations of controlled vocabularies, 2) Design and implement code to extend metadata fields to use controlled vocabularies, 3) Test and document controlled vocabularies.

Three in particular have been discussed.

Initial Understanding

For the first year, we believe that the work was to implement controlled vocabularies.

Dataverse has support for controlled vocabularies. They can be locally stored or can be dynamically pulled via an API. This was implemented by the community.

We believe this initial work has been done by the community. So we may be able to argue that the first step to this which is the intial implementation of controlled vocabularies is completed.

Links:

Related Deliverables: 2 | 1.2.2 | Define use cases for DDI-CDI support | 5 2 | 2.2.1 | Design and implement support for DDI-CDI 2 | 2.2.2 | Define use cases for supporting biomedical metadata standards 2 | 3.2.1 | Design and implement biomedical metadata standards, and add funding related metadata 2 | 4.2.1 | Assess and improve metadata support

mreekie commented 1 year ago

Cleaned things up. Moved the day-to-day notes to the sidecar issue.

mreekie commented 1 year ago

The whitepaper has been completed.

mreekie commented 1 year ago

Last updated: Mon Dec 5 2022

(1.2.1) Individual GitHub Issues for the various tasks to support useful controlled vocabularies for the NIH GREI program have been created. A whitepaper describing the steps needed to be taken to support a particular external controlled vocabulary has been completed. A general proof of concept has also been completed of the steps described in the white paper. The next tasks are to apply these steps to support Fundref and ROR

81%

mreekie commented 1 year ago

Last updated: Thu Dec 15 2022 before I left for the holiday Report: Dec 2022

There is a completed whitepaper describing the steps needed to be taken to support a particular external controlled vocabulary. Issues for support of FundRef and ROR are queued.

81%

mreekie commented 1 year ago

priority discussion with Stefano: Left:

mreekie commented 1 year ago

Monthly report


(1.2.1) Our initial discovery work led to a slightly expanded scope and we are continuing to work on issues for the support of FundRef and ROR in the current sprint.

84%

mreekie commented 1 year ago

Feb report (1.2.1) We reviewed and confirmed a list of metadata fields that the GREI metadata and search WG recommends and we commented on the vocabularies that Harvard Dataverse uses, thinks should be used, or has plans to use for particular fields, including using CrossRef's registry of research funders (FundRef) and ROR for organization names. We are continuing to work on issues for the support of FundRef (see GitHub issue) and ROR (see GitHub issue) in the current sprint

XX%

mreekie commented 1 year ago

March Report

(1.2.1) This activity was completed at an extent of 90% in year 1 and transferred to year 2.

mreekie commented 1 year ago

Draft year one summary: FY1 Annual Summary

This activity was completed at an extent of 90% in year 1. The team did an inventory of existing controlled vocabulary functionality and researched changes needed to support biomedical vocabularies. This includes a proof-of-concept for supporting Fundref and ROR and a whitepaper on how to use the existing framework for supporting other controlled vocabularies such as UMLS, CEDAR, and MeSH. Additionally, the GREI metadata and search working group confirmed a list of recommended metadata fields, which includes using FundRef and ROR for organization names. Year 2 work toward completion will be tracked as yr:2 aim:2 task:1a (2.2.1A) starting at 90% complete.

90% complete

cmbz commented 9 months ago

2024/01/03