CBIIT / R-cometsAnalytics

R package development for COMETS Analytics
12 stars 10 forks source link

COMETS 1.3. "Harmonization" file #41

Closed steven-moore closed 6 years ago

steven-moore commented 6 years ago

For our rollout, we have proposed a two step process for the cohorts:

1) Prepare data file, test integrity, download "harmonization" file, and run one simple analysis (age.2). Send harmonization file and results file to IMS so that they can begin harmonization.

2) IMS sends back a "Metabolites" tab that is identical to the original, except with an additional UID_01 column. With this new tab, the cohort then goes back to COMETS-Analytics and runs "All models". These models are now "pre-harmonized".

To accommodate this process change, I have two minor edits to the harmonization file, per discussion with Nathan Appel and David Ruggieri of IMS.

  1. The variable that is currently called "UID_01" should be renamed to make room for the IMS UID_01 variable. My suggested rename is "UID_01.comets_analytics"--which reflects the fact that this UID_01 is based on the COMETS-Analytics algorithm. Making room for both columns also will give us data to track our algorithm's performance over time (% match between algorithm and IMS final UID).

  2. The harmonization file changes the case (lower case vs. upper case) of the metabid variable as compared with the original input. To ensure that IMS can fully replicate the original harmonization file, we should provide the original harmonization case. Remember that if the cohort is not proceeding with the "All models" analysis initially, then IMS only has the "Harmonziation" file to work with and not the "Input file".

steven-moore commented 6 years ago

Regarding minor edit 2, the "input" file preserves the original capitalization. I will communicate this to Nathan Appel at IMS as we begin preparing our first merged file.

steven-moore commented 6 years ago

All that's left here is minor edit 1 of the above--change the UID_01 column name to UID_01.comets_analytics or other name.

steven-moore commented 6 years ago

Until COMETS 1.4, solution is for IMS to call their UID "UID_IMS" and send that back to the cohorts.