brsynth / RetroPathRL

Reinforcement Learning based bioretrosynthesis tool
MIT License
48 stars 17 forks source link
metabolic-engineering retropath synbio synthetic-biology

Monte Carlo Tree Search presentation

The aim of this project is to run a Monte Carlo Tree Search to perform bio-retrosynthesis, compatible with mono-component reaction rules from RetroRules (https://retrorules.org). The role of each script is detailed below. Scripts can generally be run from the command line, and have detailed comments for each function.

Reaction rules are available at https://retrorules.org/dl.

Detailed docs are in document_all_options.md.

Chemoinformatics choices are detailed in chemistry_choices.md.

Installation

Compatibility notice: RetroPathRL has been developped and tested using Linux and MacOS platforms. It is expected that RetroPathRL will not work properly using Windows OS.

Setting conda environment

conda create --name mcts python=3.9
conda activate mcts
conda install --channel conda-forge rdkit=2021
conda install pytest pyyaml

After cloning this Git repo, please run from the root directory:

pip install -e .

Visualization of results

Results can be visualised using the stand-alone Scope Viewer available on GitHub at:

git clone https://github.com/brsynth/scope-viewer.git

Toxicity calculator

For using the toxicity calculator:

conda install scikit-learn=0.19.1

DB cache

For using a database to cache results, you can find it on GitHub:

conda install pymongo
git clone https://github.com/brsynth/rp3_dcache.git

Then run pip install -e . at the root of the downloaded package. Check detailed instructions in the DB cache repository for instructions on how to set up and run the cache database.

Set-up data files

Run the following commands:

python calculate_rule_sets_similarity.py --rule_address_with_H your_rule_address --rule_address_without_H your_rule_address
python calculate_organisms.py

Notice: predefined sets of reaction rules are available on the RetroRules website.

Testing

Important: Tests have to be executed in the root folder of the program, which contains the tests folder.

python change_config.py --use_cache True --add_Hs True
pytest -v

Command line examples

python change_config.py  \
    --use_cache True
python Tree.py \
    --log_file tree.log \
    --itermax 1000 \
    --expansion_width 10 \
    --time_budget 7200 \
    --max_depth 7 \
    --UCT_policy Biochemical_UCT_1 \
    --UCTK 20 \
    --bias_k 0 \
    --k_rave 0 \
    --Rollout_policy Rollout_policy_random_uniform_on_biochemical_multiplication_score \
    --max_rollout 3 \
    --chemical_scoring SubandprodChemicalScorer \
    --virtual_visits 0 \
    --progressive_bias_strategy 0 \
    --diameter 10 12 14 16 \
    --c_name deoxiviolacein \
    --c_inchi "InChI=1S/C20H13N3O2/c24-19-13(18-12-6-2-4-8-16(12)22-20(18)25)9-17(23-19)14-10-21-15-7-3-1-5-11(14)15/h1-10,21H,(H,22,25)(H,23,24)/b18-13+" \
    --folder_to_save deoxi_07_no_H\
    --biological_score_cut_off 0.1 \
    --substrate_only_score_cut_off 0.7 \
    --chemical_score_cut_off 0.7 \
    --minimal_visit_counts 1

Expected result from this command is a folder named 'expected_results' containing:

Example for extension and 'normal' search

We expect no result from this search:

python change_config.py --DB_CACHE True --DB_time 0  --use_cache True
python Tree.py \
    --log_file tree.log \
    --itermax 1000 \
    --expansion_width 10 \
    --time_budget 7200 \
    --max_depth 7 \
    --UCT_policy Biochemical_UCT_1 \
    --UCTK 20 \
    --bias_k 0 \
    --k_rave 0 \
    --Rollout_policy Rollout_policy_random_uniform_on_biochemical_multiplication_score \
    --max_rollout 3 \
    --chemical_scoring SubandprodChemicalScorer \
    --virtual_visits 0 --progressive_bias_strategy 0 \
    --diameter 10 12 14 16 \
    --c_name deoxiviolacein \
    --c_inchi "InChI=1S/C20H13N3O2/c24-19-13(18-12-6-2-4-8-16(12)22-20(18)25)9-17(23-19)14-10-21-15-7-3-1-5-11(14)15/h1-10,21H,(H,22,25)(H,23,24)/b18-13+" \
    --folder_to_save test_tree_extension/deoxi_09 \
    --biological_score_cut_off 0.9  \
    --substrate_only_score_cut_off 0.9 \
    --chemical_score_cut_off 0.9 \
    --minimal_visit_counts 1

To rerun from the same Tree with a more tolerant score

The following command will extend the tree by 10 children. What that means is that a node that had 10 children already can have up to 10 other children added. A node that had only 5 can have up to 15 children added (original:10 plus extension:10). Morevoer, all node scores (visits and values) are reinitialised, as they can change drastically by allowing new rules. Only the structure is conserved, which allows for much faster descent on already expanded nodes. We expect 1 pathway from this search.

python change_config.py --DB_CACHE True --DB_time 0  --use_cache True
python Tree.py  \
    --log_file tree.log  \
    --itermax 1000  \
    --expansion_width 10 \
    --time_budget 7200 \
    --max_depth 7 \
    --UCT_policy Biochemical_UCT_1 \
    --UCTK 20 \
    --bias_k 0 \
    --k_rave 0 \
    --Rollout_policy Rollout_policy_random_uniform_on_biochemical_multiplication_score \
    --max_rollout 3 \
    --chemical_scoring SubandprodChemicalScorer \
    --virtual_visits 0 \
    --progressive_bias_strategy 0 \
    --diameter 10 12 14 16 \
    --folder_to_save test_tree_extension/deoxi_05 \
    --tree_to_complete end_search \
    --folder_tree_to_complete test_tree_extension/deoxi_09 \
    --biological_score_cut_off 0.1  \
    --substrate_only_score_cut_off 0.5 \
    --chemical_score_cut_off 0.5 \
    --minimal_visit_counts 1

Exploiting the DB

The DB is used as a cache: each time the application of a rule on a compound is run and takes more than DB_time, it is stored in that database.

Supplement finder

The aim of the supplement_finder script is to find potential media supplements that would allow to make other pathways by simple media supplementation. It is currently limited to 1 supplement to avoid combinatorial explosion. It allows for verification of presence in a database of interest (here: Metanetx), previously standardised under the same conditions as the Tree (with or without hydrogens/stereo).

Please unzip the databases in data/supplement_finder/data before running this script, as well as the search tree in data/supplement_finder/tree_for_testing/TPA/pickles.

Usage:

python supplement_finder.py --folder_tree_to_complete data/supplement_finder/tree_for_testing/TPA \
--database_address data/supplement_finder/data/metanetx_extracted_inchikeys.json \
--folder_to_save testing_supplement_finder/TPA

Remarks on the config file

Files organisation

Important:

Description of object classes

Rule input formatting

Rules are imported from rule_sets_similarity after calculation with calculate_rule_sets_similarity.py. The user can define his own import to replace the default import method.

A rule set as input with calculate_rule_sets_similarity.py needs to have the following characteristics...

... and possess the following keys:

Optional keys are:

MCTS improvements currently implemented.

Biosensor working example

We expect one result from this search.

python change_config.py --DB_CACHE True --DB_time 0  --use_cache True --add_Hs True --biosensor True
python Tree.py  \
    --log_file tree.log \
    --itermax 1000  \
    --expansion_width 20 \
    --time_budget 7200 \
    --max_depth 2 \
    --UCT_policy Biochemical_UCT_1 \
    --UCTK 20 \
    --bias_k 0 \
    --k_rave 50 \
    --Rollout_policy Rollout_policy_random_uniform_on_biochemical_multiplication_score \
    --max_rollout 3 \
    --chemical_scoring SubandprodChemicalScorer \
    --virtual_visits 0 \
    --progressive_bias_strategy max_reward  \
    --diameter 10 12 14 16 \
    --c_name pipecolate \
    --c_inchi "InChI=1S/C6H11NO2/c8-6(9)5-3-1-2-4-7-5/h5,7H,1-4H2,(H,8,9)" \
    --folder_to_save pipecolate \
    --EC_filter 1.5.3.7 1.5.3 \
    --biological_score_cut_off 0.1  \
    --substrate_only_score_cut_off 0.9 \
    --chemical_score_cut_off 0.9 \
    --minimal_visit_counts 1

Various remarks

Best move selection:

Standardisation: