bmir-radx / radx-project

This repo serves as a primary location for tracking issues that don't quite fit into our other dedicated repositories
0 stars 0 forks source link

Define Harmonization Metrics #40

Open marcosmro opened 11 months ago

marcosmro commented 11 months ago

The objective of this task is to define a comprehensive set of harmonization metrics for the RADx Data Hub. The scope of the task includes:

  1. Identifying the metrics that are relevant to the harmonization of data within the Data Hub.
  2. Developing a methodology for calculating these identified metrics.
  3. Plan the implementation of tooling to calculate the given metrics.

Working document here.

jkyu commented 10 months ago

I made a third revision of the harmonization metrics document. Old versions are saved as drafts in the same document. This third revision outlines the implementation of the harmonization metrics library. The library itself is in progress.

A major detail that needs attention is the implementation of the harmonization rules. I am using a hash map based on the Global Codebook for testing/exploration, but the Global Codebook does not capture all of the data elements that should be harmonized due to synonyms or spelling variations.

This results in harmonization metrics that are too good because anything not matched to an entry in the global codebook is considered trivially harmonized.

jkyu commented 10 months ago

I worked on some revisions as recommended by Matthew during our code review earlier today. I'm just posting a checklist of reminders and items I want to finish (preferably before the metrics meeting next week).

jkyu commented 10 months ago

I guess the development of the metrics library doesn't fit under "Define Harmonization Metrics." This work should be under a followup task that's titled "Develop Harmonization Metrics Library." The third revision of the metrics docs (no changes since previous post yesterday) has a set of metrics that I'm comfortable with, so if that gets an OK during the Metrics TT meeting, we can close out this task.

marcosmro commented 10 months ago

Sounds good, @jkyu . I've seen you created the new task. I'll tag it with the right milestones. Thanks.

jkyu commented 10 months ago

Working on another revision of the metrics document to explicitly and unambiguously (hopefully) define all harmonization terminology required to communicate what the proposed metrics measured. This was a blocker today, as the metrics meeting stalled on terminology and did not get to a discussion of the metrics themselves.

jkyu commented 8 months ago

We finalized the metrics that we will report for the data hub. Those discussions were part of the harmonization metrics library integration.