MassBank / MassBank-data

Official repository of open data MassBank records
74 stars 59 forks source link

Gulde et al. batch 2 #167

Closed lauperbe closed 3 years ago

lauperbe commented 3 years ago

Here are the remaining spectra from Gulde et al., for which only a chemical formula could be identified and multiple structures are possible. For example: ET404001 is the product of the hydroxlyation of N-Desmethyltramadol (https://pubchem.ncbi.nlm.nih.gov/compound/N-Desmethyltramadol) but the exact position of the hydroxylation could not be determined.

I was not sure how to handle the CH$SMILES / CH$IUPAC and CH$LINK rows so I used: CH$SMILES: No SMILES given since multiple possible CH$IUPAC: No InChI given since multiple possible CH$LINK: No INCHIKEY given since multiple possible

i am open for changes to the format if you think it should be handeled differently. Thanks in advance Benedikt

tsufz commented 3 years ago

See my comment in #166

tsufz commented 3 years ago

@lauperbe, Regarding the SMILES. We can process Markush structures now. See for example https://massbank.eu/MassBank/RecordDisplay?id=HB002857&dsn=HBM4EU.

This requires a SMARTS annotation: https://www.daylight.com/dayhtml_tutorials/languages/smarts/index.html

You can check and learn at https://simolecule.com/cdkdepict/depict.html

Have a nice weekend, Tobias

lauperbe commented 3 years ago

Hello I fixed the problem with the pull request. Now only the SMILES codes etc. are the Problem. The Authors (Rebekka Gulde: Rebekka.Gulde@eawag.ch and Baptiste Clerc: baptiste.clerc@eawag.ch) ask whether it is possible to upload it without any SMILES/Markush structure since not for all enough information is known to generate them. If yes, how would the record file have to be structured to support that?

Thanks in advance Benedikt

schymane commented 3 years ago

We've done this previously as

CH$SMILES: N/A
CH$IUPAC: N/A

Not sure if this is still the currently accepted option? It's on the current public records:

https://massbank.eu/MassBank/RecordDisplay?id=ETS00106&dsn=Eawag_Additional_Specs

...so that would be my backwards-compatible suggestion ...

tsufz commented 3 years ago

Good Morning @lauperbe and @schymane, N/A is still an option. A SMARTS annotated record is different in the metadata.

ACCESSION: HB002857
RECORD_TITLE: Dimethenamid-OH-desmethyl-N-Dealkylation (TENTATIVE); LC-ESI-QFT; MS2; CE: 30%; R=70000; [M+H]+
DATE: 2021.02.23
AUTHORS: Carolin Huber, Tobias Schulze, Martin Krauss, Department of Effect-Directed Analysis, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
LICENSE: CC0
COPYRIGHT: Copyright (C) 2021 Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
PUBLICATION: Huber et al., submitted
COMMENT: HBM4EU - science and policy for a healthy future (https://www.hbm4eu.eu)
COMMENT: COMMENT: CONFIDENCE: Tentative structure, with evidences on substitutes (Level 3b)
COMMENT: COMMENT: generated by human liver S9 incubation
COMMENT: Dimethenamid_OH-desmethyl-N-Dealkylation_30eV.txt
CH$NAME: Dimethenamid-OH-desmethyl-N-Dealkylation
CH$COMPOUND_CLASS: N/A; Biotransformation Product
CH$FORMULA: C9H15NO2S
CH$EXACT_MASS: 201.0814
CH$SMILES: C(*)C1=C(*)SC(=C1N(C(*)(C(*))C(*)O))C(*) *=[OH (n=1) & H (n=5)]
CH$IUPAC: N/A

See https://massbank.eu/MassBank/RecordDisplay?id=HB002857&dsn=HBM4EU as an example. The record is https://github.com/MassBank/MassBank-data/blob/main/HBM4EU/HB002857.txt.

Of course, this is only possible, if a tentative structure can be annotated. This is level 3 of Emma's approach.

A record with an unequivocal formula (level 4) looks like:

ACCESSION: HB002891
RECORD_TITLE: Metolachlor-desmethyl (Unequivocal molecular formula); LC-ESI-QFT; MS2; CE: 30%; R=70000; [M+H]+
DATE: 2021.02.23
AUTHORS: Carolin Huber, Tobias Schulze, Martin Krauss, Department of Effect-Directed Analysis, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
LICENSE: CC0
COPYRIGHT: Copyright (C) 2021 Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
PUBLICATION: Huber et al., submitted
COMMENT: HBM4EU - science and policy for a healthy future (https://www.hbm4eu.eu)
COMMENT: COMMENT: CONFIDENCE: Unequivocal molecular formula (Level 4)
COMMENT: COMMENT: generated by human liver S9 incubation
COMMENT: Metolachlor_desmethyl_30eV.txt
CH$NAME: Metolachlor-desmethyl
CH$COMPOUND_CLASS: N/A; Biotransformation Product
CH$FORMULA: C14H16ClNO2
CH$EXACT_MASS: 265.0859
CH$SMILES: N/A
CH$IUPAC: N/A

See https://massbank.eu/MassBank/RecordDisplay?id=HB002891&dsn=HBM4EU and https://github.com/MassBank/MassBank-data/blob/main/HBM4EU/HB002891.txt.

Best wishes, Tobias

lauperbe commented 3 years ago

This pull request has been updated to #169 . Thus I closed this one.