Closed christabone closed 1 year ago
It looks like there are several similar instances of synonyms in the file. Is this a recent change? Curious as to why we haven't had parsing issues with these entries in the past...
grep 'synonym: "\\"' chebi.obo
synonym: "\"(2S,3R)-2-[[(2S)-2-amino-5-(diaminomethylideneamino)pentanoyl]amino]-3-hydroxybutanoic acid\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"(2S)-2,6-diaminononanedioic acid\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"dimethyl (2R)-pyrrolidine-1,2-dicarboxylate\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"phosphono (2S)-2,6-diaminohexanoate\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"(5R)-5-amino-4,8-dioxo-1,3,2-dioxazocane-2-carboxamide\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"pyridine-2,3-diamine\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"(3S)-3,17-dihydroxy-3-[(trimethylazaniumyl)methyl]heptadeca-4,6-dienoate\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"13-(3,4-dimethyl-5-pentyluran-2-yl)tridecanoic acid\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"(9R,10S)-9,10,16-trihydroxyhexadecanoic acid\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"(Z)-2-cyano-3-(3,4-dihydroxy-5-nitrophenyl)-N,N-diethylprop-2-enamide\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"[(2R,3R,4S,5S,6R)-2-[(1R,2Z,3S,4R,5S)-2-(cyanomethylidene)-3-hydroxy-4,5-dimethoxycyclohexyl]oxy-4,5-dihydroxy-6-(hydroxymethyl)oxan-3-yl] (Z)-3-(4-hydroxy-3-methoxyphenyl)prop-2-enoate\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"N-[(E,2S,3R)-1,3-dihydroxyoctadec-4-en-2-yl]ormamide\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"(4aS,5aS,12aR)-7-chloro-4-(dimethylamino)-1,6,10,11,12a-pentahydroxy-3,12-dioxo-4a,5,5a,6-tetrahydro-4H-tetracene-2-carboxamide\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"2-(5-hydroxy-4a-methyl-8-methylidene-1,2,3,4,5,8a-hexahydronaphthalen-2-yl)prop-2-enoic acid\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"[2-[2-(3,4-dihydroxyphenyl)-5,7-dihydroxy-4-oxochromen-8-yl]-4,5-dihydroxy-6-(hydroxymethyl)oxan-3-yl] acetate\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"5-(2-chloroethyl)-1-[(2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)oxolan-2-yl]pyrimidine-2,4-dione\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"4-amino-5-chloro-1-[(2R,4S,5R)-4-luoro-5-(hydroxymethyl)oxolan-2-yl]pyrimidin-2-one\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"2-(ethylamino)-9-[(2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)oxolan-2-yl]-1H-purin-6-one\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"(2R,3S,5R)-5-(6-aminopurin-9-yl)-2-methyloxolan-3-ol\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"2-amino-7-[(2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)oxolan-2-yl]-3H-pyrrolo[2,3-d]pyrimidin-4-one\"" EXACT IUPAC_NAME [SUBMITTER]
synonym: "\"4-amino-1-[(2R,3R,4S,5R)-5-azido-3,4-dihydroxy-5-(hydroxymethyl)oxolan-2-yl]pyrimidin-2-one\"" EXACT IUPAC_NAME [SUBMITTER]
All of these entries were deposited into ChEBI by the MetaboLights database. Something must have gone wrong when they submitted these entries. The quotation marks ('') should not be present in any of the synonyms. I will try and fix these entries so the issue does not arise in next months release.
Thanks @amalik01 !
I also came across another instance of an escaped quote in a synonym. Not sure if this is intentional but just wanted to let you know. I'm not sure how many of these might exist:
[Term]
id: CHEBI:76100
name: 1-O-[6-O-(4-pyridylcarbamoyl)-alpha-D-galactopyranosyl]-N-hexacosanoylphytosphingosine
subset: 3_STAR
def: "A glycophytoceramide having a 6-O-(4-pyridylcarbamoyl)-alpha-D-galactopyranosyl residue at the O-1 position and an hexacosanoyl group attached to the nitrogen." []
synonym: "N-{(2S,3S,4R)-3,4-dihydroxy-1-[6-O-(pyridin-4-ylcarbamoyl)-alpha-D-galactopyranosyloxy]octadecan-2-yl}hexacosanamide" EXACT IUPAC_NAME [IUPAC]
synonym: "alpha-GalCer-6\"-(4-pyridyl)carbamate" RELATED [ChEBI]
synonym: "alpha-GalCer-6\"-(pyridin-4-yl)carbamate" RELATED [ChEBI]
synonym: "PyrC-alpha-GalCer" RELATED [ChEBI]
property_value: http://purl.obolibrary.org/obo/chebi/mass "978.43110" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/formula "C56H103N3O10" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/monoisotopicmass "977.76435" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/charge "0" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/inchikey "GONJMTFPPNECAU-VEDNRHISSA-N" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/smiles "CCCCCCCCCCCCCCCCCCCCCCCCCC(=O)N[C@@H](CO[C@H]1O[C@H](COC(=O)Nc2ccncc2)[C@H](O)[C@H](O)[C@H]1O)[C@H](O)[C@H](O)CCCCCCCCCCCCCC" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/inchi "InChI=1S/C56H103N3O10/c1-3-5-7-9-11-13-15-17-18-19-20-21-22-23-24-25-26-27-29-31-33-35-37-39-50(61)59-47(51(62)48(60)38-36-34-32-30-28-16-14-12-10-8-6-4-2)44-67-55-54(65)53(64)52(63)49(69-55)45-68-56(66)58-46-40-42-57-43-41-46/h40-43,47-49,51-55,60,62-65H,3-39,44-45H2,1-2H3,(H,59,61)(H,57,58,66)/t47-,48+,49+,51-,52-,53-,54+,55-/m0/s1" xsd:string
xref: PMID:23960235 {source="Europe PMC"}
xref: PDBeChem:1LA
is_a: CHEBI:59389
Hi @christabone
I have now fixed most of these issues. The changes will be visible in next months release.
Regarding the synonym (α-GalCer-6"-(4-pyridyl)carbamate) in CHEBI:76100. Unfortunately at the moment, we do not have a special character for the double prime symbol (https://en.wikipedia.org/wiki/Prime_(symbol)) in ChEBI and therefore have to use quotation marks to represent this symbol in the synonyms. This will need fixing at some point in the near future.
Hi ChEBI folks,
One of our ETL pipelines at alliancegenome.org just started failing recently and we traced it down to a weird synonym format in the Sept 1st release of the ChEBI ontology. More specifically, this term:
The synonym starts with a strange quote-backslash-quote-thing which we think might be an error? Would anyone at ChEBI have a moment to check if this is the case?
If it's a legitimate synonym we will work on our end to fix our parser.
Thanks for your time!