IQSS / dataverse-pm

Project management issue tracker for the Dataverse Project. Note: Related links and documents may not be public.
https://dataverse.org
0 stars 0 forks source link

GREI 4: Task 6 - Assemble and provide metrics about HDV data collection for therapeutic areas #229

Closed cmbz closed 4 months ago

cmbz commented 7 months ago

Overview

Tasks

Resources

cmbz commented 6 months ago

Status: May 2024

jggautier commented 6 months ago

@sbarbosadataverse, I searched for datasets whose metadata contains:

Could you review the datasets from that search to see if many of these datasets seem irrelevant?

This will help me evaluate the way I'm finding these datasets, before we consider using more search terms like we spoke about, such as the "therapeutic areas" at https://www.cdisc.org/standards/therapeutic-areas/disease-area and the names of NIH centers and institutes.

We also spoke about looking at the keywords and topic classifications in the metadata of datasets from NIH-funded research in the Harvard Dataverse Repository (https://github.com/IQSS/dataverse-pm/issues/217), and using those as search terms, too.

I put those keywords and topic classifications in tabs in the spreadsheet at https://docs.google.com/spreadsheets/d/1OAQiSkgyeb_YdM4rFhl439FUeNadvmg5R5r0d4PN4us. Could you take a look?

My impression is that someone with domain knowledge would need to review these before we can use them for searching. Feels like many of the keywords and especially the topic classifications wouldn't be that helpful, but I'm not sure. Maybe we could use only the keywords when we see that it comes from a relevant vocabulary, like MeSH, SNOMED-CT, and NCIT.

cmbz commented 6 months ago

Status: June 2024

jggautier commented 5 months ago

@sbarbosadataverse and @cmbz, I'm going to close this GitHub issue. I'm curious how these counts will be used and during the GREI-Monthly CWG Meeting on July 10 I plan to ask about them (unless someone else brings them up).

jggautier commented 4 months ago

Re-opening this issue. @sbarbosadataverse asked that I include counts of datasets that include the term "covid19"in their metadata. I'm getting that count now and will update the Harvard Dataverse tab of the the Top Biomedical Research Categories in GREI Repository Holdings spreadsheet today.

jggautier commented 4 months ago

I updated the "Harvard Dataverse" tab and the "Aggregate" tab of the Top Biomedical Research Categories in GREI Repository Holdings spreadsheet.

sbarbosadataverse commented 4 months ago

Closed the last remaining checkbox as we consider in a new issue how to make use of the information we collected for this ask from the GREI planning unit