AlexanderKroll / ESP

MIT License
62 stars 22 forks source link

Request for SMILES #5

Closed wangpeilin closed 1 year ago

wangpeilin commented 1 year ago

I would like to use a method other than molecular fingerprints to represent the substrate molecule, could you please provide the SMILES string of the substrate molecule used in the paper, only the ChEBI number is not convenient, thanks.

AlexanderKroll commented 1 year ago

The folder "ESP\data\substrate_data" contains the file "chebiID_to_inchi.tsv". You can read this file in Python using the following command:

df_chebi_to_inchi = pd.read_csv("chebiID_to_inchi.tsv", sep = "\t)

Once you have obtained an InChI string for every ChEBI ID, you can convert them into SMILES strings using the following lines in Python:

#For example, given a InChI string such as:
inchi = "InChI=1S/C6H12O6/c7-1-2-3(8)4(9)5(10)6(11)12-2/h2-11H,1H2/t2-,3-,4+,5-,6?/m1/s1"

from rdkit import Chem
#convert inchi to mol:
mol = Chem.inchi.MolFromInchi(inchi)
#convert mol to SMILES strin:
smiles = Chem.MolToSmiles(mol)