According to the NIAID team, a repository owner was excited about the Metadata Completeness badge, but lamented that it was not informative enough. This prompted the NIAID team to request improvements to the legend for the metadata completeness badge suggesting the inclusion of missing fields. Since missing fields can be repository-limited, it makes more sense for this sort of information to be provided at the repository level. Towards that end, we will create a repository-level metadata compatibility badge which will help visualize how compatible repositories are with the metadata ingested by the NIAID Data Ecosystem.
The score will be calculated similarly to the metadata completeness score, but instead of creating a sum of the binary presence/absence of a field. The ratio of records with that field vs total records in the repository will be used.
Note that the calculation should focus on top-level properties, not nested properties. E.g. - measurementTechnique, NOT measurementTechnique.name
For fields where may augmentation occur: species, infectiousAgent, funding, healthCondition, topicCategory, measurementTechnique, variableMeasured, the ratio of augmented coverage vs ingested coverage should also be considered in the calculation
Issue Name
Generate Metadata Compatibility Scores
Issue Description
According to the NIAID team, a repository owner was excited about the Metadata Completeness badge, but lamented that it was not informative enough. This prompted the NIAID team to request improvements to the legend for the metadata completeness badge suggesting the inclusion of missing fields. Since missing fields can be repository-limited, it makes more sense for this sort of information to be provided at the repository level. Towards that end, we will create a repository-level metadata compatibility badge which will help visualize how compatible repositories are with the metadata ingested by the NIAID Data Ecosystem.
The score will be calculated similarly to the metadata completeness score, but instead of creating a sum of the binary presence/absence of a field. The ratio of records with that field vs total records in the repository will be used.
See the spreadsheet for example calculation of the percent coverage for each property: https://docs.google.com/spreadsheets/d/1N_MyQcbCRFtyPIpEy2sP9JDPWJSGCf7zp_NlOtd5URE/edit#gid=955429340
Note that the calculation should focus on top-level properties, not nested properties. E.g. -
measurementTechnique
, NOTmeasurementTechnique.name
For fields where may augmentation occur:
species
,infectiousAgent
,funding
,healthCondition
,topicCategory
,measurementTechnique
,variableMeasured
, the ratio of augmented coverage vs ingested coverage should also be considered in the calculationIssue Discussion
The discussion of this issue was started in this related issue: https://github.com/orgs/NIAID-Data-Ecosystem/projects/6/views/5?pane=issue&itemId=50899235
Please select the type of metadata improvement
Meta URL
https://docs.google.com/spreadsheets/d/1N_MyQcbCRFtyPIpEy2sP9JDPWJSGCf7zp_NlOtd5URE/edit#gid=955429340
Related WBS task
https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/19 https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/2
For internal use only. Assignee, please select the status of this issue
Status Description
No response
Request status check list