NIAID-Data-Ecosystem / nde-crawlers

Harvesting infrastructure to collect and standardize dataset and computational tool metadata
Apache License 2.0
0 stars 0 forks source link

[Metadata Improvement]: Generate Benchmark measurementTechniques data for standardization #128

Open gtsueng opened 3 months ago

gtsueng commented 3 months ago

Issue Name

Generate Benchmark measurementTechniques data for standardization

Issue Description

To readily benchmark measurementTechnique standardization approaches, it would be good to have a dataset of terms as a benchmark.

Aside from manually mapping existing terms, this benchmark could be generated with a few approaches:

Issue Discussion

This issue is expected to be discussed only in the context of the results of the measurementTechnique standardization

Please select the type of metadata improvement

Meta URL

No response

Related WBS task

https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/13

For internal use only. Assignee, please select the status of this issue

Status Description

No response

Request status check list

gtsueng commented 3 months ago

@ZubairQazi can you generate the data described in Method 2 and dump it into a spreadsheet with the columns: Repository, identifier, name, description, measurement terms?

ZubairQazi commented 2 months ago

Dumped into this spreadsheet: (Updated Link) https://docs.google.com/spreadsheets/d/1lG4hS-PQJ_IRxCc02W3Oz2OFMdYDMURRPbpm3Jg2kk0/edit#gid=1373622361

gtsueng commented 2 months ago

Great! Thanks @ZubairQazi that was useful for a sanity check. After the improvements, I think we can use this info, please take all the GPT predictions and generate a frequency table.

ZubairQazi commented 2 months ago

Frequency Table: https://docs.google.com/spreadsheets/d/1lG4hS-PQJ_IRxCc02W3Oz2OFMdYDMURRPbpm3Jg2kk0/edit#gid=1081886756

gtsueng commented 2 months ago

@DylanWelzel Once you figure out how to get and parse the mapping from NCBO BioPortal, you can use that to generate test data (Method 1) for evaluating your measurementTechnique standardization approach.