BorgwardtLab / proteinshake

Protein structure datasets for machine learning.
https://proteinshake.ai
BSD 3-Clause "New" or "Revised" License
93 stars 7 forks source link

Substrate/Product data for Enzymes #265

Closed jozhang97 closed 8 months ago

jozhang97 commented 8 months ago

Hi,

Thanks for the useful repo and nice presentation at NeurIPS!

I'm looking to get more information about the ligand (e.g. pdb id, pubchem id, smiles) for the enzyme commission task. I downloaded but could not find this information in the annotation file here https://github.com/BorgwardtLab/proteinshake/blob/6542d3c51f4b20b6b071275a7b8fe9055709e8b3/proteinshake/datasets/enzyme_commission.py#L52

Perhaps this is because the enzyme was not crystalized with the substrate. But the enzymes are quite well documented (there are Gene Oncotology and EC numbers), so I suspect that the ligand information is somewhere. Do you have any recommendations on how to find it?

timkucera commented 8 months ago

Thanks for the issue! We're working with high pressure on the next version release, where the different databases will be more accessibly integrated (including ligand information). I would suggest to wait for that release, if you need the data urgently you can retrieve them from BRENDA (https://www.brenda-enzymes.org/ecexplorer.php?browser=1) or Rhea (https://www.rhea-db.org/)

jozhang97 commented 8 months ago

Thank you for your contribution!