Closed oliveralka closed 7 months ago
@andrewrobertjones: May you have a look if that is what you had in mind? I am not sure where to place such an example in the document.
Multiple Identification methods will be referenced to one specific adduct. If multiple adducts exist, or adduct grouping was performed features for each adduct should still exist and be referenced as shown in the example below. Here, identification has been performed using the CHEBI database and "no database" (e.g. de-novo identification).
I am not sure about the "SME_ID_REF_Ambiguity_Code" is that still 1 ?
SFH SMF_ID SME_ID_REFS SME_ID_REF_Ambiguity_Code adduct_ion
SMF 5 1 | 2| 3| 4| 5| 6 1 [M+H]+
SMF 6 7 | 8 1 [M+Na]+
SEH SME_ID evidence_input_id database_identifier chemical_formula smiles inchi chemical_name uri derivatized_form adduct_ion ...
SME 1 ms_run[1]:index=274 CHEBI:16737 C4H7N3O null null Creatinine null null [M+H]+ ...
SME 2 ms_run[1]:index=284 CHEBI:16737 C4H7N3O null null Creatinine null null [M+H]+ ...
SME 3 ms_run[1]:index=290 CHEBI:16737 C4H7N3O null null Creatinine null null [M+H]+ ...
SME 4 ms_run[1]:index=274 null C4H7N3O null null Creatinine null null [M+H]+ ...
SME 5 ms_run[1]:index=284 null C4H7N3O null null Creatinine null null [M+H]+ ...
SME 6 ms_run[1]:index=290 null C4H7N3O null null Creatinine null null [M+H]+ ...
SME 7 ms_run[1]:index=384 null C4H7N3O null null Creatinine null null [M+Na]+ ...
SME 8 ms_run[1]:index=390 null C4H7N3O null null Creatinine null null [M+Na]+ ...
smaller example
SFH SMF_ID SME_ID_REFS SME_ID_REF_Ambiguity_Code adduct_ion
SMF 5 1\|2\|3\|4\|5\|6 1 [M+H]+
SMF 6 7\|8 1 [M+Na]+
SEH SME_ID evidence_input_id database_identifier ... adduct_ion ...
SME 1 ms_run[1]:index=274 CHEBI:16737 ... [M+H]+ ...
SME 2 ms_run[1]:index=284 CHEBI:16737 ... [M+H]+ ...
SME 3 ms_run[1]:index=290 CHEBI:16737 ... [M+H]+ ...
SME 4 ms_run[1]:index=274 null ... [M+H]+ ...
SME 5 ms_run[1]:index=284 null ... [M+H]+ ...
SME 6 ms_run[1]:index=290 null ... [M+H]+ ...
SME 7 ms_run[1]:index=384 null ... [M+Na]+ ....
SME 8 ms_run[1]:index=390 null ... [M+Na]+ ...
I will take a look at this one
This is related to #133.
The same is true for the database_identifier
.
As far as I have understood, here the most reliable identification is presented in SML (e.g. spectral library search). Additionally, I can provide an optional column with further databases/identifications used (accurate mass search and de novo identification).
What has to be done if there are different identification outcomes: Metabolite1: de-novo -> Galactose Metabolite1: spectral library search -> Glucose Metabolite1: accurate mass search -> Glucose
Of course, you may trust the spectra library search the most since it is often a custom library (for your instrument and column).
To sum it up: The most trusted one is referenced in the SME_IDREFS and then the feature in the SML the others can be documented in additional opt columns along the way. Is that correct?
@oliveralka Please reopen if this is still relevant.
It is not clear to me how to reference if multiple identification methods are used and if multiple adducts exist.
Maybe they should only refer to the adduct they came from? But include different identification approaches? What should be done if adduct grouping was performed beforehand?
References to the identification evidence (SME elements) via referencing SME_ID values. Multiple values MAY be provided as a “|” separated list to indicate ambiguity in the identification. For the case of a consensus approach where multiple adduct forms are used to infer the SML ID, different features should just reference the same SME_ID value(s).
e.g. (sorry about the incomplete structure)