MolecularAI / REINVENT4

AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization.
Apache License 2.0
359 stars 89 forks source link

Error countered when running Dockstream in REINVENT #91

Closed Luzuokun closed 5 months ago

Luzuokun commented 5 months ago

Hi, I attempted to modify the Reinvent_TLRL.ipynb to make it run DockStream. However, I encountered the following error: Traceback (most recent call last): File "/home/luzuokun/anaconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent_plugins/components/run_program.py", line 26, in run_command result = sp.run(command, **args) File "/home/luzuokun/anaconda3/envs/reinvent4/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/home/luzuokun/anaconda3/envs/DockStream/bin/python', '/home/luzuokun/Documents/reinvent/DockStream/docker.py', '-conf', '/home/luzuokun/Documents/reinvent/REINVENT4/experiment/BRAF/ADV_docking.json', '-output_prefix', '0', '-smiles', '"FC(F)(F)CCNc1cc(-n2ncc3ccc(Nc4cccnc4)cc32)ccn1;CN1CCN(c2ncccc2-c2cccn3nc(Nc4cccc(Cl)c4)nc23)CC1=O;Cc1ncc(Nc2ccc(F)c(C3(C(F)F)COCC(=N)N3)c2)cn1;CNCCCOc1ccc2[nH]cnc(=Nc3cccc4[nH]ncc34)c2c1;CCC1C2Cc3ccc(OC)cc3C1(C)CCN2C;Cc1ccccc1-c1nnc(-c2ccncc2)n1C;Cc1ccc(C(=O)Nc2cccc(C(C)(C)C)c2)cc1C=CC(C)N;Cc1ccc(C(=O)Nc2cc(N(C)CCN(C)C)cc(C(F)(F)F)c2)cc1Nc1ncnc2cnc(N3CCN(C)CC3)nc12;CN1CCC(C(=O)Nc2cc(O)cc(-c3cc(NS(=O)(=O)c4cccc(F)c4)ccc3F)c2)CC1;CNC(=O)CCC(=O)Nc1ccc(F)c(-c2cncc(=Nc3ccc(OC)cc3)[nH]2)c1;Cc1ccc(C(=O)Nc2ccc(Cl)c(C(F)(F)F)c2)cc1-c1nc2nc(NCCCN3CCN(C)CC3)ncc2cc1-c1ccccc1;CC(C)c1nc(-c2cccc(NS(=O)(=O)c3ccc(F)c(F)c3)c2)c(-c2c[nH]c(Nc3ccc(C#N)cc3)n2)s1;CC(C)(C#N)c1cccc(NC(=O)Nc2ccc(Oc3ccnc(NC(=O)C4CC4)c3)cc2)c1;Cc1cc(C)c2nnc(NC3CCN(Cc4ccccc4)CC3)nc2c1;COc1ccc(NC(=O)c2ccc(C)c(Nc3ncnc4cnc(N5CCN(C(C)C)CC5)nc34)c2)cc1C(F)(F)F;Cc1ccc(NC(=O)C2CC2)cc1Nc1ccccc1C(=O)Nc1ccc(Cl)nc1;Cc1ccc(C(=O)Nc2ccc(C(F)(F)F)c(Cl)c2)cc1Nc1ncnc2c(N)nc(NCCO)nc12;N=c1nc2c(c[nH]1)CCCc1cc(F)ccc1-2;Nc1nc(C2CC3CCC2N3)c(F)s1;Cc1cc(Nc2ncnc3c(N)nc(N4CCN(C)CC4)nc23)cc(C2(N)CCCNC2)c1C;O=C(c1cccc(-c2nnc3n2CCCCC3)c1)N1CCOCC1;NCCCS(=O)(=O)Nc1cc(Cl)cc(C(=O)Nc2c[nH]c3ncc(-c4ccccc4)cc23)c1Cl;CN(C)CCCNC(=O)c1cc(NC(=O)c2cccc3c(N)ncnc23)c[nH]1;O=c1[nH]c2cccc(Oc3cccc(Nc4ccc(Cl)cn4)c3)c2[nH]1;CC(C)(C)c1noc(NC(=O)Nc2cccc3c(Oc4cc(Oc5ccnc6[nH]ccc56)ccn4)ccnc23)n1;Cc1ccc(C(C)NC(=O)c2ccc(F)c(C(F)(F)F)c2)cc1C(=O)N1Cc2ccncc2C1;CCCS(=O)(=O)Nc1ccc(F)c(C(=O)Nc2cnc3[nH]c(C4(C)CCCN4C)nc3c2)c1F;Cc1cc(NC(=O)Nc2ccc(Oc3ccnc(NC(=O)C4CC4)c3)nc2)cc(C(F)(F)F)c1;O=S(=O)(Nc1ccc(F)c(Nc2ccc3nn(-c4ccccc4)nc3c2)c1F)N1CCCC1;COc1cccc(C(=O)Nc2cc(C(=O)Nc3cccc(Cl)c3)ccc2C)c1;CCCS(=O)(=O)Nc1ccc(F)c(C(=O)c2c[nH]c3ncc(-c4ccc(F)c(Cl)c4)cc23)c1F;CCCS(=O)(=O)Nc1ccc(F)c(-c2ccc3ncnc(NC4CCOCC4)c3c2)c1F;CCCS(=O)(=O)Nc1ccc(F)c(-c2cc(=N)[nH]cc2F)c1F;O=S(=O)(c1ccc(F)c(Cl)c1)N1CCC(Nc2c(F)cccc2F)CC1;CCN(CC)c1ccc(N2C(=O)CC3C2=CCC3(C)C)cc1;COc1ncc(-c2c(N)noc2C(C)(C)C)cc1NS(=O)(=O)c1ccccc1;CCCS(=O)(=O)Nc1ccc(F)c(C(=O)c2c[nH]c3ncc(-c4ccc(Cl)cc4)cc23)c1F;Cc1ccccc1C(=O)N=c1nc(N2CCOCC2)c2ccccc2[nH]1;Cc1ccc(NC(=O)c2cccc(C(F)(F)F)c2)cc1-c1cc(N2CCOCC2)c(=O)n(C)c1;O=C(Nc1ccc(F)c(Cl)c1)C1CCCCC1;Clc1cccc(Nc2ncnc3ccc(-c4ccnc(N5CCN(CC6CCNCC6)CC5)c4)cc23)c1;CN(C)CCN(C)c1ncc2ncnc(Nc3cc(C(=O)Nc4cc(CN5CCCC5)c(Cl)c(C(F)(F)F)c4)ccc3Cl)c2n1;Cc1ccc(NC(=O)c2cccc(C(F)(F)F)c2)cc1C(=O)NCc1ccc(C(F)(F)F)cc1;CC(=O)NCc1ccc(C)c(-c2cc(C(=O)NCc3cc(C(C)C)no3)ccc2F)c1;O=C1C=C(c2cccnc2)CC(c2ccc3c(c2)CCC3)C1;CN(C)S(=O)(=O)c1ccc(-c2nnc(-c3ccccc3)o2)cc1;CCCCCCCNC(=O)N=c1[nH]c2ccc(OCc3ccccc3)nc2s1;COc1cccc(C2(c3cccc(C(O)C(C)C)c3)NC(c3nc(-c4cccs4)c[nH]3)Cc3c2[nH]c2ccccc32)c1;CNC(=O)c1cc(-c2nc(C3CC3)[nH]c2-c2cccc(NS(C)(=O)=O)c2F)cc(Cl)n1;COc1cc(Cl)cc(-c2ncnc3[nH]c(C4CC4)nc23)c1;Cc1ccc(C(=O)Nc2cc(C(C)(C)C)on2)cc1Nc1ncnc2nc(NCCN(C)C)ncc12;COCC(CNC(=O)c1ccc(C(N)=NO)cc1)NC(=O)C(C)(C)C;CCCS(=O)(=O)Nc1ccc(Cl)c(C(=O)Nc2cnc3[nH]c(=O)[nH]c3c2)c1F;Cn1nc(C(C)(C)C)cc1NC(=O)Nc1cccc(N=c2cc(NC3CC3)[nH]c(N3CCCC3)n2)c1;Cc1ccc(NC(=O)c2nnsc2-c2cccc(C(C)(C)C)c2)cc1-n1cc(-c2ccncc2)nn1;CCN(CCO)CCCOc1ccc2c(Nc3cccc(NC(=O)Nc4ccc(N(CCCl)CCCl)cc4)c3)ncnc2c1;CC(CO)CNc1nccc(C2=C(c3cccc(C(N)=O)c3)CC(C)(C)NC2(C)C)n1;O=C1Nc2cc(-c3cccc(-c4cnco4)c3)ccc2N(C2CC2)c2cnccc21;O=C(NCCO)c1cccc(-c2c[nH]n3c(=O)cc(-c4ccc(Cl)cc4)nc23)c1;CN1c2ccccc2Sc2nc(-c3ccccc3)[nH]c(=O)c21;CNc1cc2[nH]cnc(=Nc3cccc(C(=O)Nc4cc(N5CCN(C)CC5)c(F)c(C(F)(F)F)c4)c3)c2cn1;CCn1cc(-c2ccccc2OC)c(-c2ccncc2)n1;O=C(N=c1cc(C2CC2)[nH]c2c(-c3ccccc3)cnn12)c1ccccc1;Cc1cc(C)c(NC(=O)c2c(NC(=O)c3cc(CNC(C)(C)C)cc(C(F)(F)F)c3)sc3c2CCC3)c(C)c1;CCCS(=O)(=O)Nc1ccc(F)c(C(=O)Nc2cnc3cc(N4CCC(F)C4)nn3c2C)c1F;COc1ccc(NC(=O)c2ccc(C)c(Nc3ncnc4cnc(N(C)CCN(C)C)nc34)c2)cc1C(F)(F)F;O=c1nc(Nc2ccc(Br)cc2)[nH]c2ccccc12;COc1cc2ncnc(Oc3cccc(NC(=O)c4cnn(-c5cccnc5)c4)c3)c2cc1NC(C)=O;Cc1ccc(NC(=O)c2cccc(C(C)(C)C)c2)cc1-n1cc(-c2cncc(N3CCN(C)CC3)c2)nn1;Cc1ccc2c(c1)CC(CN1CCN(CCc3ccc4c(c3)NC(=O)CO4)CC1)N2;O=S(=O)(NCc1cccnc1)c1ccc(F)cc1F;CNC(=O)c1cc(C2c3c(c(O)n(-c4ccc(F)cc4)c3O)C3(C(=O)N4CCOCC4)CN(C4CCC4)CC23)ccc1F;CCCS(=O)(=O)Nc1ccc(F)c(C(=O)Nc2cnc3[nH]ncc3c2)c1F;CN1CCN(C(=O)c2c[nH]c3ncc4cc5c(nc4c23)CC(c2ccccc2Cl)OC5)CC1;CN(C)C(=O)c1ccc(C2NC3CC=CC2C3)c(-c2c(S(=O)(=O)c3cccc(F)c3)cnc3[nH]ccc23)c1;COCCNc1ccc2ncnc(Nc3cc(C(=O)Nc4ccc(OC)c(C(F)(F)F)c4)ccc3F)c2n1;O=C(Nc1cccc(Oc2cccc3[nH]c(=O)[nH]c23)c1)c1ccccc1F;CN1CC=C(c2c[nH]nc2-c2ccc3c(c2)CCC3=NO)CC1;Cc1ccc(NC(=O)c2cccc(C(F)(F)F)c2)cc1-c1noc(CN2CCOCC2)n1;Cc1nc(=O)cc(Nc2cccc(C(=O)NCc3ccnc(NCC(C)(C)C)c3)c2F)[nH]1;O=C(NC1CC1)c1ccc(-n2cc3ccccc3n2)s1;Nc1ccc2[nH]c(N)nc2c1;Cc1ccc(C(=O)Nc2cc(OCCN(C)C)cc(C(F)(F)F)c2)cc1Nc1ncnc2cnc(N3CCCC3)nc12;Cc1ccc(C(=O)Nc2ccc(Cl)c(C(F)(F)F)c2)cc1-n1ncc2c(N3CCN(C4CCN(C)CC4)CC3)ncnc21"', '-print_scores']' returned non-zero exit status 1.

It appears that the issue arises when running docker.py, as the program feeds multiple SMILES strings to docking all at once. I recall the DockStream documentation mentioning, "First, as we report only one value per ligand (and a 'consensus score' is not yet supported), you should only use one embedding / pool." Is this error caused by this? How can I set "one embedding / pool?

Below are the configuration files for docking and Reinvent: { "docking": { "header": { "logging": { "logfile": "/home/luzuokun/Desktop/AutoDock_Vina_demo/ADV_docking.log" } }, "ligand_preparation": { "embedding_pools": [ { "pool_id": "RDkit_pool", "type": "RDkit", "parameters": { "removeHs": false, "coordinate_generation": { "method": "UFF", "maximum_iterations": 300 } }, "input": { "standardize_smiles": false, "type": "smi", "input_path": "/home/luzuokun/Documents/reinvent/DockStreamCommunity/notebooks/../data/1UYD/ligands_smiles.txt" }, "output": { "conformer_path": "/home/luzuokun/Desktop/AutoDock_Vina_demo/ADV_embedded_ligands.sdf", "format": "sdf" } } ] }, "docking_runs": [ { "backend": "AutoDockVina", "run_id": "AutoDockVina", "input_pools": [ "RDkit_pool" ], "parameters": { "binary_location": "/home/luzuokun/anaconda3/envs/vina/bin", "parallelization": { "number_cores": 4 }, "seed": 42, "receptor_pdbqt_path": [ "/home/luzuokun/Desktop/AutoDock_Vina_demo/ADV_receptor.pdbqt" ], "number_poses": 2, "search_space": { "--center_x": 3.3, "--center_y": 11.5, "--center_z": 24.8, "--size_x": 15, "--size_y": 10, "--size_z": 10 } }, "output": { "poses": { "poses_path": "/home/luzuokun/Desktop/AutoDock_Vina_demo/ADV_ligands_docked.sdf" }, "scores": { "scores_path": "/home/luzuokun/Desktop/AutoDock_Vina_demo/ADV_scores.csv" } } } ] } }

reinvent configuration:

run_type = "staged_learning" device = "cuda:0" tb_logdir = "tb_stage2" json_out_config = "_stage2.json"

[parameters]

prior_file = "/home/luzuokun/anaconda3/envs/reinvent4/lib/python3.10/site-packages/priors/reinvent.prior" agent_file = "/home/luzuokun/Documents/reinvent/REINVENT4/experiment/BRAF/TL_reinvent.model.20.chkpt" summary_csv_prefix = "stage2"

batch_size = 100

use_checkpoint = false

[learning_strategy]

type = "dap" sigma = 128 rate = 0.0001

[[stage]]

max_score = 1.0 max_steps = 500

chkpt_file = "stage2.chkpt"

scoring_function.type = "custom_product"

[stage.scoring] type = "geometric_mean"

[[stage.scoring.component]] [stage.scoring.component.custom_alerts]

[[stage.scoring.component.custom_alerts.endpoint]] name = "Alerts"

params.smarts = [ "[;r8]", "[;r9]", "[;r10]", "[;r11]", "[;r12]", "[;r13]", "[;r14]", "[;r15]", "[;r16]", "[;r17]", "[#8][#8]", "[#6;+]", "[#16][#16]", "[#7;!n][S;!$(S(=O)=O)]", "[#7;!n][#7;!n]", "C#C", "C(=[O,S])[O,S]", "[#7;!n][C;!$(C(=[O,N])[N,O])][#16;!s]", "[#7;!n][C;!$(C(=[O,N])[N,O])][#7;!n]", "[#7;!n][C;!$(C(=[O,N])[N,O])][#8;!o]", "[#8;!o][C;!$(C(=[O,N])[N,O])][#16;!s]", "[#8;!o][C;!$(C(=[O,N])[N,O])][#8;!o]", "[#16;!s][C;!$(C(=[O,N])[N,O])][#16;!s]" ]

[[stage.scoring.component]] [stage.scoring.component.QED]

[[stage.scoring.component.QED.endpoint]] name = "QED" weight = 0.6

[[stage.scoring.component]] [stage.scoring.component.NumAtomStereoCenters]

[[stage.scoring.component.NumAtomStereoCenters.endpoint]] name = "Stereo" weight = 0.4

transform.type = "left_step" transform.low = 0

[[stage.scoring.component]] [[stage.scoring.component.DockStream.endpoint]] name = "Docking into 6CM4" weight = 1 params.configuration_path = "/home/luzuokun/Documents/reinvent/REINVENT4/experiment/BRAF/ADV_docking.json" params.docker_script_path = "/home/luzuokun/Documents/reinvent/DockStream/docker.py" params.docker_python_path = "/home/luzuokun/anaconda3/envs/DockStream/bin/python" transform.type = "reverse_sigmoid" transform.high = -7 transform.low = -13.5 transform.k = 0.2

[diversity_filter]

type = "IdenticalMurckoScaffold" bucket_size = 10 minscore = 0.7

[inception]

smiles_file = "" # no seed SMILES memory_size = 50 sample_size = 10

Best! Have a good day!

Lu

halx commented 5 months ago

Hi,

many thanks for your interest in REINVENT and welcome to the community!

We don't support DockStream anymore. We have some discussion about this already in this forum.

What does the logfile /home/luzuokun/Desktop/AutoDock_Vina_demo/ADV_docking.log say? Your DockStream configuration appears to have only one embedding pool.

Cheers, Hannes.