Closed steven-moore closed 6 years ago
A related issue is that the IMS_UID should be more fully utilized in the harmonization scheme. After IMS harmonizes the metabolites, they will typically provide a file back to each group with a new field indicating the final harmonization. I don't believe that we have previously posted any datasets like this, so I have pasted one below which has the final IMS UID. Please let me know if you have any questions about this.
We determined that it is too difficult to "guess" about how to retrospectively fix these issues. Instead, if COMP-ID or CHEMICAL-ID are used, they need to be used as a number only field in the exact format received from Metabolon. So, we will keep this requirement--the current functionality is correct. We will also have to add instructions to our tutorials and e-mails to notify people. Issue closed.
The new harmonization algorithm is checking for chemical-id and comp-id columns, and using them to perform matches to the UID file. However, this assumes that no characters have been added to the comp-id or chemical id. For those using R for data analysis, it will be relatively common for them to use comp or chemical id as their metabolite names, and to add a character to the front to comply with R's column name requirements. This will return an error, per screenshot below. Could we add a step where we strip out any characters in the comp or chemical id? Sample files for testing also included below.
Scrambled.CPSII.data (2).xlsx
Scrambled.CPSII.data_comp_id_removed.xlsx