We chose two molecules from ZINC250k to act as starting molecules and defined the substructures of these starting molecules to be fixed
i、How to define the substructures in string representation(SMILES or SELFIES), these string representations are continuous, and the substructure will locate in different places, so i am confused how to get the substructure.
ii、I know Recap can get some substructures, if it's the same way LIMO uses? Or are they from some websites producing important substructures?
iii、How to estimate whether the generated molecule does contain the whole substructure? I guess we can convert the molecule to MorganFingerPrint and use the function HasSubstructMatch in RDKit.
Thank you!
^ v ^
We did not use any automated procedure for defining the substructure. Instead, it's chosen essentially by eye, and then selecting the characters in the SMILES/SELFIES string that corresponds to that substructure.
I don't think I'm familiar with Recap, but no, it's not from that or any other substructure-generating website. The substructure-constrained task was a proof of concept, so we just used substructures chosen by eye, but if one wanted to expand this work then using an automated substructure-generating algorithm would make sense.
Yes, the HasSubstructMatch function in RDKit was what we used to make sure the generated molecule has the whole substructure.
Hello~ In LIMO's paper, you mentioned that
i、How to define the substructures in string representation(SMILES or SELFIES), these string representations are continuous, and the substructure will locate in different places, so i am confused how to get the substructure. ii、I know Recap can get some substructures, if it's the same way LIMO uses? Or are they from some websites producing important substructures? iii、How to estimate whether the generated molecule does contain the whole substructure? I guess we can convert the molecule to MorganFingerPrint and use the function HasSubstructMatch in RDKit. Thank you! ^ v ^