NIAID-Data-Ecosystem / nde-crawlers

Harvesting infrastructure to collect and standardize dataset and computational tool metadata
Apache License 2.0
0 stars 0 forks source link

[Metadata Improvement]: Generate Repository-level Metadata Compatibility Scores #125

Open gtsueng opened 4 months ago

gtsueng commented 4 months ago

Issue Name

Generate Metadata Compatibility Scores

Issue Description

According to the NIAID team, a repository owner was excited about the Metadata Completeness badge, but lamented that it was not informative enough. This prompted the NIAID team to request improvements to the legend for the metadata completeness badge suggesting the inclusion of missing fields. Since missing fields can be repository-limited, it makes more sense for this sort of information to be provided at the repository level. Towards that end, we will create a repository-level metadata compatibility badge which will help visualize how compatible repositories are with the metadata ingested by the NIAID Data Ecosystem.

The score will be calculated similarly to the metadata completeness score, but instead of creating a sum of the binary presence/absence of a field. The ratio of records with that field vs total records in the repository will be used.

See the spreadsheet for example calculation of the percent coverage for each property: https://docs.google.com/spreadsheets/d/1N_MyQcbCRFtyPIpEy2sP9JDPWJSGCf7zp_NlOtd5URE/edit#gid=955429340

Note that the calculation should focus on top-level properties, not nested properties. E.g. - measurementTechnique, NOT measurementTechnique.name

For fields where may augmentation occur: species, infectiousAgent, funding, healthCondition, topicCategory, measurementTechnique, variableMeasured, the ratio of augmented coverage vs ingested coverage should also be considered in the calculation

Issue Discussion

The discussion of this issue was started in this related issue: https://github.com/orgs/NIAID-Data-Ecosystem/projects/6/views/5?pane=issue&itemId=50899235

Please select the type of metadata improvement

Meta URL

https://docs.google.com/spreadsheets/d/1N_MyQcbCRFtyPIpEy2sP9JDPWJSGCf7zp_NlOtd5URE/edit#gid=955429340

Related WBS task

https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/19 https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/2

For internal use only. Assignee, please select the status of this issue

Status Description

No response

Request status check list