This repository includes the scripts used for the manuscript "Synthetic field guided asynchronous chemoenzymatic synthesis planning."
AceRetro (Asynchronous ChemoEnzymatic Retrosynthesis) is a computer-aided chemoenzymatic synthesis planning tool.
Dependencies
Operating systems
./sfscore/sfscore.py
shows the basic usage. If you want to use SFScore elsewhere, follow the steps below.
Create Conda environment though conda env create -f ./process_reaction_database/environment.yml
Activate Conda environment conda activate sfscore_train
sfscore_path = [PROJECT_DIRECTORY]/'sfscore' # replace PROJECT_DIRECTORY
import sys
sys.path.append(str(sfscore_path))
from sfscore import SFScore
sfscore_model = SFScore()
# load the model in 'process_reaction_database/saved_model/ecfp4_4096_3_layer_epoch10.pt' by default
sfscore_model.load()
smiles = 'O=C(COP(=O)(O)O)[C@H](O)[C@H](O)CO'
sfscore = sfscore_model.score_from_smi(smiles)
print('SMILES:',smiles,', SFScore:',sfscore)
# SMILES: O=C(COP(=O)(O)O)[C@H](O)[C@H](O)CO , SFScore: [0.4322359 0.6146086]
The scripts and checkpoints of trained models are in ./process_reaction_database
.
process_reaction_database/fingerprint_embedding.ipynb
: Process USPTO and ECREACT database. Process ECFP4 and MAP4 fingerprint embedding. (~ 2h)
process_reaction_database/model_training.ipynb
: SFScore model training and results processing. (~ 40 min)
evalueate_score/benchmark_zinc_one_step.ipynb
: SFScore results and one-step retrosynthesis(RXN4Chemistry)/retrobiosynthesis(ASKCOS-Enzy) on 11K molecules from ZINC database (Cost ~ 43.5 hours) (Please refer to the next section for one-step retrosynthesis/retrobiosynthesis prediction).
evalueate_score/process_zinc_one_step.ipynb
: Process one-step retrosynthesis results. (Fig. 2)
evalueate_score/benchmark_askcos_results.ipynb
: SFScore results on pathways predicted by ASKCOS-hybrid (https://github.com/itai-levin/hybmind) in Nat Commun 13, 7747 (2022). (https://doi.org/10.1038/s41467-022-35422-y) (Fig. 3 and ASKCOS pathways in Fig. 5)
Installation of ASKCOS (Nat Commun, 13, 7747 (2022)):
./pathway_search_standalone/askcos-core
./pathway_search_standalone/askcos-core/askcos/data
# Download ASKCOS
## Directory: ./pathway_search_standalone
git clone https://github.com/xuanliugit/askcos-core.git
## Change branch to aceretro
cd askcos-core
git switch aceretro
# Download ASKCOS data
## Directory: ./pathway_search_standalone/askcos-core/askcos/
git lfs clone https://github.com/xuanliugit/askcos-data.git
## Rename 'askcos-data' to 'data'
mv askcos-data data
Installation of RXN4Chemistry (Digital Discovery, 2023,2, 489-501):
pathway_search_standalone/rxn_cluster_token_prompt
pathway_search_standalone/rxn_cluster_token_prompt/models
following this link.# Download RXN4Chemistry
## Directory: ./pathway_search_standalone
git clone https://github.com/xuanliugit/rxn_cluster_token_prompt.git
## Change branch to aceretro
cd rxn_cluster_token_prompt
git switch aceretro
## Model Directory: ./pathway_search_standalone/rxn_cluster_token_prompt/models
# Create environmentt
conda create -n aceretro python=3.6.13
conda activate aceretro
pip install rdkit-pypi
# Install packages for ASKCOS
pip install -r ./pathway_search_standalone/askcos-core/requirements.txt
# Install packages for RXN4Chemistry
cd pathway_search_standalone/rxn_cluster_token_prompt
pip install -e .
# Install other packages
pip install pubchempy==1.0.4
pip install git+https://github.com/reymond-group/map4@v1.0
conda install -c tmap tmap
# ./pathway_search_standalone/
from scripts.search_utils import hybridSearch
hybridSearch = hybridSearch()
smiles = 'CC[C@@H](CO)NCCN[C@@H](CC)CO' # Target molecule's SMILES
# Conduct a 3 min asynchronous chemoenzymatic synthesis planning
explored_rxns, explored_nodes, start_node = hybridSearch.get_chemoenzy_path_async(smiles, max_depth=10, chem_topk=10, max_num_templates=250, max_branching=15, time_lim=180)
To process the search result:
# ./pathway_search_standalone/
from scripts.search_utils import build_graph_from_async
import networkx as nx
graph_json = build_graph_from_async(explored_rxns,explored_nodes,start_node)
g = nx.node_link_graph(graph_json) # reaction graph
# Check ./pathway_search_standalone/result_processing.ipynb to extract synthesis routes from reaction graph
pathway_search_standalone/search_rxn4chemistry_askcos.ipynb
: Search hybrid pathways to 1k molecules from ZINC (Fig. 5) (around 6.25 days) and case studies (Fig. 6,7,8).
pathway_search_standalone/result_processing.ipynb
: Process the search results and translate graphs to readable pathways (less than 5 min)