Open ichxw opened 3 days ago
Hi,
many thanks for your interest in REINVENT and welcome to the community!
I see two problems there:
Many thanks, Hannes.
Hi Hannes, Thank you for your quick response. I used scaffolds.smi as the input file provided by the program. I tested the staged learning process multiple times, and each time, it produced the same error message. However, the output files varied: most of the time, staged_learning_1.csv was empty, but occasionally, it contained a few hundred SMILES lines. Below are the last few lines of the log file when staged_learning_1.csv was not empty.
09:36:53 <INFO> Creating scoring component QED
09:36:53 <INFO> Writing tabular data for stage to staged_learning_1.csv
09:36:53 <INFO> Starting stage 1 <<<
09:36:53 <INFO> Current GPU memory usage: 884 MiB used, 39562 MiB free
09:36:54 <INFO> Score: 0.74 Agent NLL: 20.46 Valid: 100% Step: 1
| Agent Prior Target Score SMILES SMILES_state Input_Scaffold R-groups Scaffold Molecular weight Molecular weight (raw) Unwanted SMARTS Unwanted SMARTS (raw)
| 16.6641 14.4043 113.5311 0.9994959 CC(C)Cc1cncc2ccc(CN3CCCCC3)cc12 1 c12c(C[*])cncc1ccc(C[*])c2 *C(C)C|C1CCCN(*)C1 c1cc2cc(CN3CCCCC3)ccc2cn1 0.9994959 282.4310 1.0000000 1.0000
| 31.1224 32.3684 95.4699 0.9987366 CCc1ccc2cncc(CCCON=C(N)c3ccc(-n4cccn4)cc3CC)c2c1 1 c12c(C[*])cncc1ccc(C[*])c2 *CCON=C(c1ccc(-n2cccn2)cc1CC)N|*C C(=NOCCCc1cncc2ccccc12)c1ccc(-n2cccn2)cc1 0.9987366 427.5520 1.0000000 1.0000
| 33.0129 30.6990 -30.6990 0.0000000 O=C(O)CC(CCO)c1ccc(CCc2ccc3cncc(CCCO)c3c2)c(Cl)c1 1 c12c(C[*])cncc1ccc(C[*])c2 *CCO|c1c(Cl)c(C*)ccc1C(CC(O)=O)CCO c1ccc(CCc2ccc3cnccc3c2)cc1 0.0000000 0.0000 0.0000000 0.0000
| 11.6166 12.1508 107.0825 0.9315100 CCCc1ccc2cncc(CN(C)C)c2c1 1 c12c(C[*])cncc1ccc(C[*])c2 *N(C)C|*CC c1ccc2cnccc2c1 0.9315100 228.3390 1.0000000 1.0000
| 10.3642 8.7407 56.0993 0.5065621 CCc1ccc2cncc(CNC)c2c1 1 c12c(C[*])cncc1ccc(C[*])c2 *NC|*C c1ccc2cnccc2c1 0.5065621 200.2850 1.0000000 1.0000
| 6.6187 6.0158 22.1438 0.2199967 CCc1ccc2cncc(CN)c2c1 1 c12c(C[*])cncc1ccc(C[*])c2 *N|*C c1ccc2cnccc2c1 0.2199967 186.2580 1.0000000 1.0000
| 21.0663 20.5225 107.4715 0.9999537 COc1ccc(CN)cc1Cc1cncc2ccc(CO)cc12 1 c12c(C[*])cncc1ccc(C[*])c2 *c1c(OC)ccc(CN)c1|O* c1ccc(Cc2cncc3ccccc23)cc1 0.9999537 308.3810 1.0000000 1.0000
| 18.7851 18.1979 109.8013 0.9999936 Clc1ccc(Cc2cncc3ccc(Cc4nnn[nH]4)cc23)cc1Cl 1 c12c(C[*])cncc1ccc(C[*])c2 *c1ccc(Cl)c(Cl)c1|n1nn[nH]c1* c1ccc(Cc2cncc3ccc(Cc4nnn[nH]4)cc23)cc1 0.9999936 370.2430 1.0000000 1.0000
| 13.8535 13.2819 114.6822 0.9997200 ClCCc1cncc2ccc(CN3CCCCC3)cc12 1 c12c(C[*])cncc1ccc(C[*])c2 C(*)Cl|C1N(*)CCCC1 c1cc2cc(CN3CCCCC3)ccc2cn1 0.9997200 288.8220 1.0000000 1.0000
| 44.2730 44.1825 -43.7477 0.0033969 CCCc1cncc2ccc(CNc3nc4c(n3C)-c3cc(-c5cc(OC)ccc5C)c(CO)cc3C(=O)NC4)cc12 1 c12c(C[*])cncc1ccc(C[*])c2 C(*)C|c12nc(N*)n(C)c1-c1cc(-c3c(C)ccc(OC)c3)c(CO)cc1C(=O)NC2 O=C1NCc2nc(NCc3ccc4cnccc4c3)[nH]c2-c2cc(-c3ccccc3)ccc21 0.0033969 561.6860 1.0000000 1.0000
09:36:55 <WARN> reinvent_plugins.normalizers.rdkit_smiles: c1(C[*])cncc2c1cc(C[*])cc2*NC(=O)N|c1(*)c(C)nc(NC(c2c(Cl)ccc(N3CCC(F)(F)CCC(N)=NO)c2F)=O)nc1C could not be converted
Here is another log file without any output smiles.
09:50:27 <INFO> Creating scoring component QED
09:50:27 <INFO> Writing tabular data for stage to staged_learning_1.csv
09:50:27 <INFO> Starting stage 1 <<<
09:50:27 <INFO> Current GPU memory usage: 884 MiB used, 39562 MiB free
09:50:28 <WARN> reinvent_plugins.normalizers.rdkit_smiles: [*]Cc1cc2c(C[*])cncc2cc1*N1CCN(c2cc3c(c(=O)n3c(-c4ccc(NC(C)=O)cc4)nc(OC)n3)cn2)CC1C|*C could not be converted
09:50:28 <WARN> reinvent_plugins.normalizers.rdkit_smiles: c1(C[*])cc2c(cncc2C[*])nc1*NCCSc1c2sc(=O)cc-2c(O)cc1O|C* could not be converted
You can see the problematic smiles were actually generated from the program. Please let me know if there was something wrong in the configuration of toml input. I had listed the changes in REINVENT4/configs/toml/staged_learning.toml for libinvent early in this thread. Other part of the toml file was exactly the same as the original. Thanks for your time.
Hello, I was trying to run libinvent and failed due to an issue of smiles conversion. Here the part of the toml file. Others are the same as the staged_learning.toml.
After running I got an error as below:
Looking at the log file, looks like there was a problem in recognizing the scaffold smiles strucutures:
The rdkit version I'm using is 2024.03.6. Any responses are appreciated. Thanks.