Rose-STL-Lab / LIMO

generative model for drug discovery
59 stars 14 forks source link

About substructures #20

Closed tszslovewanpu closed 4 months ago

tszslovewanpu commented 4 months ago

Hello~ In LIMO's paper, you mentioned that

We chose two molecules from ZINC250k to act as starting molecules and defined the substructures of these starting molecules to be fixed

i、How to define the substructures in string representation(SMILES or SELFIES), these string representations are continuous, and the substructure will locate in different places, so i am confused how to get the substructure. ii、I know Recap can get some substructures, if it's the same way LIMO uses? Or are they from some websites producing important substructures? iii、How to estimate whether the generated molecule does contain the whole substructure? I guess we can convert the molecule to MorganFingerPrint and use the function HasSubstructMatch in RDKit. Thank you! ^ v ^

PeterEckmann1 commented 4 months ago

Hi,

  1. We did not use any automated procedure for defining the substructure. Instead, it's chosen essentially by eye, and then selecting the characters in the SMILES/SELFIES string that corresponds to that substructure.
  2. I don't think I'm familiar with Recap, but no, it's not from that or any other substructure-generating website. The substructure-constrained task was a proof of concept, so we just used substructures chosen by eye, but if one wanted to expand this work then using an automated substructure-generating algorithm would make sense.
  3. Yes, the HasSubstructMatch function in RDKit was what we used to make sure the generated molecule has the whole substructure.
tszslovewanpu commented 4 months ago

Thank you very much! Happy May Day~