PHI-base / curation

PHI-base curation
0 stars 0 forks source link

Add SMILES strings to chemistry annotations #197

Open jseager7 opened 11 months ago

jseager7 commented 11 months ago

It was recently requested that we include SMILES strings for chemistry annotations in PHI-base.

For example, the SMILES string for deoxynivalenol (CHEBI:10022) is:

[H][C@@]12O[C@]3([H])C=C(C)C(=O)[C@@H](O)[C@]3(CO)[C@@](C)(C[C@H]1O)[C@]21CO1

We can retrieve these strings from the ChEBI ontology (from the 'smiles' annotation property), so there should be no need to curate them manually, but we do need to decide how they should be displayed.

The easiest solution is to add an annotation extension for the SMILES string which will be automatically generated for every chemistry annotation (resistance / sensitivity / normal) and added to the JSON export. We decided to do something similar for adding FRAC codes. The main benefit of this is that it doesn't (or shouldn't) require any changes to the logic of the PHI-base 5 website.

We may also want to add the SMILES strings to the anti-infective list on the PHI-base/data repository.

My first task is figuring out how to automatically extract the SMILES strings from the ChEBI ontology. I can probably make use of the existing anti-infectives list to get the full list of ChEBI terms that we need to map.

@CuzickA Does this plan sound okay to you?

CuzickA commented 11 months ago

Hi @jseager7, this sounds like a good plan.

I can email you the latest anti-infective list as this is not currently available on GitHub. Once I have finished checking and approving the block of chemistry curated papers and finished updating the anti-infective list with an further information we can then load the updated copy to GitHub (including a new column for the SMILES strings).

I was also thinking about the ChEBI ids. At the moment they are not displayed in PHI-base 5 and may also require to be added via AE.

the Chemistry specific AEs would then be alteration_in_archetype ChEBI_id FRAC_code SMILES

Also see past comment in older ticket https://github.com/PHI-base/curation/issues/20

jseager7 commented 11 months ago

Thanks. I can wait until the latest anti-infectives list is uploaded to GitHub.

It will be easy enough to include the ChEBI term ID as an annotation extension, especially because I'm already going to be querying ChEBI to get the SMILES strings.