MolecularAI / REINVENT4

AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization.
Apache License 2.0
337 stars 87 forks source link

Regarding with how to use REINVENT4.3 to train Prior, Agent, and Sample #87

Closed kingljy0818 closed 4 months ago

kingljy0818 commented 5 months ago

Hello,

I am an experienced user of REINVENT 3.2, which has very detailed Jupyter notebooks, making it easy to train Prior, Agent, and Sample. However, after installing REINVENT 4.3, I followed the instructions to create the REINVENT 4.3 notebook (Reinvent_demo.ipynb). From this notebook, I'm unable to understand how to train Prior, Agent, and Sample as easily as I could with the notebooks from REINVENT 3.2. Therefore, I would greatly appreciate your assistance in guiding me on how to use REINVENT 4.3 to train Prior, Agent, and Sample. Thank you very much!

Best regards,

Jiyuan

halx commented 5 months ago

Hi,

many thanks for your interest in REINVENT4 and welcome to the community!

You would need to be more specific as to what you mean by "training Prior, Agent, and Sample". It is unclear to me what you are truing to achieve.

Many thanks, Hannes.

kingljy0818 commented 5 months ago

Hi,

Using the REINVENT 3.2 notebook as an example, the Transfer_Learning_Demo.ipynb notebook is used to generate a prior model by learning from the Chembl33 compound library. It continues to use Transfer_Learning_Demo.ipynb to build an Agent model based on the previously trained prior model, while also learning from an existing covalent compound library (Enamine). Subsequently, sampling_Demo.ipynb is used to create my own covalent compound library. However, in REINVENT 4.3, I'm unsure how to implement the above process. Could you please guide me on this? Thank you very much.

Best regards,

Jiyuan

halx commented 5 months ago

Right, so you need to start off with an empty model (network and vocabulary setup) with the help of reinvent/runmodes/create_model/create_reinvent.py. Then do TL to create a new prior (or base) model. Focusing is just another instance of TL with a small dataset. How to do TL is described in the notebooks. We also have config examples in configs/toml including for sampling. The question though is if you really get that much out of a newly trained ChEMBL prior. Allegedly, chemical diversity hasn't changed much over time. Not sure how well it would work to get a "focused" model by retraining the current Reinvent prior with a newer ChEMBL.

With TL you have, however, only (some) structural control of the generation process but not the properties. For that you would need to carry out reinforcement learning (RL). It makes really only sense to speak of an agent model in the context of RL. There agent is a network that is optimized towards the rewards in the form of a score.