CBIIT / R-cometsAnalytics

R package development for COMETS Analytics
12 stars 10 forks source link

COMETS 1.5. Accommodate new field for data harmonization #33

Closed steven-moore closed 1 year ago

steven-moore commented 6 years ago

For COMETS 1.4, I would like to focus on three things: 1) Harmonization; 2) Error handling; and 3) Queue management/troubleshooting. This issue applies to the first of these.

Currently, we are doing all the harmonization on the backend at IMS. For each cohort, they start with our attempt to auto-harmonize but then revise/edit substantially, until all entries are logically consistent. Nathan pointed out that, once this has been done for each study, the most sensible approach is to send our UID back to the cohort as a column to add to their datafile, so that files is permanently harmonized from then forward. Ella, Ewy and I should meet with Nathan to discuss, but on a preliminary basis, I agree.

If we go this route, we will need to accommodate a new column for each datafile in our harmonization algorithm. It may also change the (non-software) workflow for each cohort--for example, we have each study run the Integrity Check and one or two tables that they send to IMS for pre-harmonization. Then, we feed back the harmonized metabolite UID, teh cohort analyst adds it to their file, and runs one or two tables again. Then, if IMS is able to harmonize these easily, then we the cohort runs the whole analysis.

Let's discuss once 1.3 is complete.

steven-moore commented 6 years ago

By updating the UID file on our backend and just using the existing harmonization functionality, this issue should be resolved. I will test and verify. COMETS-Analytics does not need to explicitly do anything with the new field, as long as it passes it on.

steven-moore commented 6 years ago

Further note: The above is only true if the algorithm knows which column of the UID file to look at, but I'm not entirely sure that it does at present. Nevertheless, it is true that a more up-to-date UID file will improve the accuracy of the match. I propose that we focus for 1.4 on perfecting the UID update process and table this issue of perfecting the harmonization algorithm (using the new column of information) for 1.5.