Goal-directed generative capabilities

RishikeshMagar commented 9 months ago

Hi,

Firstly, thank you for making this excellent repository. The SAFE paper has some really interesting insights on molecular design.

As I was going over the codebase, I did not find any codes for the reinforcement learning part. Could you please let me know if I am missing something? Also, how are you getting the advantage estimates in the ppo loss ? Is there an additional value network ?

Apologies if the questions are too obvious. I would really appreciate you insights on this.

maclandrol commented 9 months ago

Hi @RishikeshMagar

Thanks for your interest in SAFE. Please see my answer in this discussion for context: https://github.com/datamol-io/safe/issues/20.

We use the TRL library since it's convenient for Hugging Face models. The code is in the history of the repo (before we did some cleaning for the release): https://github.com/datamol-io/safe/blob/2dd3d5394cb4e748f5a9beec2446df28c2666386/scripts/mol_design.py

I recommend playing with any hyperparameters and also potentially finetuning pretrained models on your specific chemical space first.

There are alternative algorithms to PPO that you might want to consider as they would usually give consistently good performance:

the REINVENT (RL) algorithm. There is even some small tweaking you can do to improve performance in transformers.
using CbAS

RishikeshMagar commented 9 months ago

Thanks for the prompt response. I suppose the create_reference_model creates the value network. I will study the code more carefully and will reopen the issue if needed. Thanks a lot again!

datamol-io / safe

Goal-directed generative capabilities #31