JanoschMenke / metis

Python-based GUI to collect Feedback of Chemist in Molecules
MIT License
27 stars 5 forks source link
de-novo-drug-design drug-discovery generative-ai human-in-the-loop machine-learning preference-learning

Metis - A Python-Based User Interface to Collect Expert Feedback for Generative Chemistry Models


Metis is GUI to enable the collection of accurate and detailed feedback on small molecules. At its core, it is built around Esben Bjerrums rdEditor using PySide2.

You can find the preprint at ChemRxiv


Table of Contents


Set up

Installation

Download the repository and navigate to the download location. You can install metis with pip install .. Make sure the environment you want to install into is activated and has python >= 3.9, <3.11 installed.

If you wish to use REINVENT 3 in the backend, also install REINVENT 3 on a remote machine.

Dependencies

Some notes on the dependencies.

scikit-learn

The version scikit-learn constraints are only set to make sure that the examples given here work. In theory, you could use any scikit-learn version. If you want to use Reinvent in the backend, you need to make sure that the version of scikit-learn Reinvent is using on the remote machine should be updated to the version that matches your local installation used by metis.

cairosvg

Depending on the OS you are running installing cairosvg through pip can cause issues, as cairo is not found. On MacOS you can solve this by installing cairo using homebrew, or you can install cairosvg using conda-

SSH

It is assumed you have a working version of Reinvent on a Server instance that is running Slurm and ssh.

  1. Change the ssh settings in the example_project/de_novo_files/ssh_settings.yml file.

    • ssh_login: your login to SSH e.g. username@remote_server you should be able to access your remote server without a password, for example, using an RSA Key
    • path_remote_folder: path on the remote machine, from where Reinvent files will be loaded and stored.
    • de_novo_json: specify which default reinvent.json file to use
    • default_slurm: specify which default Slurm job to use
  2. Copy and unzip the metis_reinvent.zip to the remote machine. Make sure that the path_remote_folder in the ssh_settings.yml file matches with the folder location and also in the initial_reinvent.json.

Usage

After installation simply run:

metis -f path/to/settings.yml --output /path/where/to/save/

This will start the GUI. Examples can be found below.

Examples

UI Only

In the most simple example, only the GUI will be started to collect feedback. No models are trained and no de novo run started.

- If you want to show the atom contributions to the predictions/model explanation
- (show_atom_contributions: render: true)
- you will experience heavy slowdowns when switching to a new molecule.
- The only solution at the moment is not to show them.
- You can set show_atom_contributions: render: False.
- This will yield a much smoother experience.    
cd example_project
metis -f settings_ui.yml --output results/

Reward Model

Here, next to collecting feedback, a reward model is also trained on the feedback. For this, we provided a QSAR model and Oracle model for JNK3 activity. The setting use_oracle_score: False, will use the feedback of humans as the target variable that is to be predicted. If the setting is set to True, the molecules liked by the chemist will be scored by the oracle, and these scores will then be used as the target varible for the reward model. This can be thought of as an active learning setting, where the chemists decides which molecules are being "biologically validated".

cd example_project
metis -f settings_reward_model.yml --output results/

De Novo Design

With these settings, a REINVENT de novo run can be started directly using Metis on a remote machine. The remote machine needs:

Once copied and unzipped, the paths and settings in the de_novo_files folder need to be adapted to fit to your paths on the remote machine.

cd example_project
metis -f settings_denovo.yml --output results/

Settings

Here is a brief overview of all settings

Name Type Required Default
seed Union[int, None] False
tutorial bool False False
debug bool False False
max_iterations int True ...
innerloop_iterations Union[int, None] False None
activity_label str True ...
introText str True ...
propertyLabels Dict True ...
data DataConfig True ...
ui UIConfig True ...
de_novo_model Union[DeNovoConfig, None] False None
reward_model Union[RewardModelConfig, None] False None

DataConfig

Name Type Required Default
initial_path str True ...
path str True ...
selection_strategy str True ...
num_molecules int True ...
run_name str True ...

UIConfig

Name Type Required Default
show_atom_contributions AdditionalWindowsConfig False {'render': False, 'path': None, 'ECFP': None}
show_reference_molecules AdditionalWindowsConfig False {'render': False, 'path': None, 'ECFP': None}
tab TabConfig True ...
navigationbar NavigationbarConfig True ...
general GeneralConfig True ...
substructures SubstructureConfig True ...
global_properties GlobalPropertiesConfig True ...

AdditionalWindowsConfig

Name Type Required Default
render bool False False
path Union[str, None] False
ECFP Union[ECFPConfig, None] False

ECFPConfig

Name Type Required Default
bitSize int True ...
radius int True ...
useCounts bool False False

TabConfig

Name Type Required Default
render bool True ...
tab_names List True ...

NavigationbarConfig

Name Type Required Default
sendButton NavButtonConfig True ...
editButton NavButtonConfig True ...

NavButtonConfig

Name Type Required Default
render bool False False

GeneralConfig

Name Type Required Default
render bool False True
slider bool False False

SubstructureConfig

Name Type Required Default
render bool False False
liabilities Dict True ...

Liablities control which properties you can select substructures for: Keys such as ugly or tox are simply used within the script. name will define how the button is called color will define the color of the button as well as the color of the atom highlight

liabilities:
      ugly:
        name: "Mutagenicity"
        color: "#ff7f7f"
      tox:
        name: "Toxicity" 
        color: "#51d67e"
      stability:
        name: "Stability"
        color: "#eed358"
      like:
        name: "Good"
        color: "#9542f5"

GlobalPropertiesConfig

Name Type Required Default
render bool False False
liabilities List True ...

DeNovoConfig

Name Type Required Default
ssh_settings str True ...
use_human_scoring_func bool False False
use_reward_model bool False False

RewardModelConfig

Name Type Required Default
use_oracle_score bool False True
weight Union[str, None] False None
oracle_path Union[str, None] False None
qsar_model_path str True ...
training_data_path str True ...
ECFP ECFPConfig True ...