HUPO-PSI / mzTab

mzTab Reporting MS-based Proteomics and Metabolomics Results
https://hupo-psi.github.io/mzTab
39 stars 17 forks source link

Question: Application of non unique evidence_ID (with multiple ids/adducts) #110

Closed oliveralka closed 7 months ago

oliveralka commented 6 years ago

It is not clear to me how to reference if multiple identification methods are used and if multiple adducts exist.

Maybe they should only refer to the adduct they came from? But include different identification approaches? What should be done if adduct grouping was performed beforehand?

References to the identification evidence (SME elements) via referencing SME_ID values. Multiple values MAY be provided as a “|” separated list to indicate ambiguity in the identification. For the case of a consensus approach where multiple adduct forms are used to infer the SML ID, different features should just reference the same SME_ID value(s).

e.g. (sorry about the incomplete structure)

SMF 
SMF_ID SME_ID_REFS
     1       1,2,3,4,5,6         [M+H]+
     2                                 [M+Na]+     
SME_ID evidence_ID
1           index = 123 database
1           index = 124 database
1           index = 156 database
1           index = 123 nondatabase
1           index = 124 nondatabase
1           index = 156 nondatabase
oliveralka commented 6 years ago

@andrewrobertjones: May you have a look if that is what you had in mind? I am not sure where to place such an example in the document.

Multiple Identification methods will be referenced to one specific adduct. If multiple adducts exist, or adduct grouping was performed features for each adduct should still exist and be referenced as shown in the example below. Here, identification has been performed using the CHEBI database and "no database" (e.g. de-novo identification).

I am not sure about the "SME_ID_REF_Ambiguity_Code" is that still 1 ?

SFH SMF_ID  SME_ID_REFS SME_ID_REF_Ambiguity_Code   adduct_ion
SMF 5   1 | 2| 3| 4| 5| 6   1   [M+H]+
SMF 6   7 | 8       1   [M+Na]+
SEH SME_ID  evidence_input_id   database_identifier chemical_formula    smiles  inchi   chemical_name   uri derivatized_form    adduct_ion  ...
SME 1   ms_run[1]:index=274 CHEBI:16737 C4H7N3O null    null    Creatinine  null    null    [M+H]+  ...
SME 2   ms_run[1]:index=284 CHEBI:16737 C4H7N3O null    null    Creatinine  null    null    [M+H]+  ...
SME 3   ms_run[1]:index=290 CHEBI:16737 C4H7N3O null    null    Creatinine  null    null    [M+H]+  ...
SME 4   ms_run[1]:index=274 null    C4H7N3O null    null    Creatinine  null    null    [M+H]+  ...
SME 5   ms_run[1]:index=284 null    C4H7N3O null    null    Creatinine  null    null    [M+H]+  ...
SME 6   ms_run[1]:index=290 null    C4H7N3O null    null    Creatinine  null    null    [M+H]+  ...
SME 7   ms_run[1]:index=384 null    C4H7N3O null    null    Creatinine  null    null    [M+Na]+ ...
SME 8   ms_run[1]:index=390 null    C4H7N3O null    null    Creatinine  null    null    [M+Na]+ ...

smaller example

SFH SMF_ID  SME_ID_REFS SME_ID_REF_Ambiguity_Code   adduct_ion
SMF 5   1\|2\|3\|4\|5\|6    1   [M+H]+
SMF 6   7\|8        1   [M+Na]+
SEH SME_ID  evidence_input_id   database_identifier ... adduct_ion  ...
SME 1   ms_run[1]:index=274 CHEBI:16737 ... [M+H]+  ...
SME 2   ms_run[1]:index=284 CHEBI:16737 ... [M+H]+  ...
SME 3   ms_run[1]:index=290 CHEBI:16737 ... [M+H]+  ...
SME 4   ms_run[1]:index=274 null    ... [M+H]+  ...
SME 5   ms_run[1]:index=284 null    ... [M+H]+  ...
SME 6   ms_run[1]:index=290 null    ... [M+H]+  ...
SME 7   ms_run[1]:index=384 null    ... [M+Na]+ ....
SME 8   ms_run[1]:index=390 null    ... [M+Na]+ ...
andrewrobertjones commented 6 years ago

I will take a look at this one

nilshoffmann commented 6 years ago

This is related to #133.

oliveralka commented 6 years ago

The same is true for the database_identifier. As far as I have understood, here the most reliable identification is presented in SML (e.g. spectral library search). Additionally, I can provide an optional column with further databases/identifications used (accurate mass search and de novo identification).

What has to be done if there are different identification outcomes: Metabolite1: de-novo -> Galactose Metabolite1: spectral library search -> Glucose Metabolite1: accurate mass search -> Glucose

Of course, you may trust the spectra library search the most since it is often a custom library (for your instrument and column).

To sum it up: The most trusted one is referenced in the SME_IDREFS and then the feature in the SML the others can be documented in additional opt columns along the way. Is that correct?

nilshoffmann commented 7 months ago

@oliveralka Please reopen if this is still relevant.