I've been trying to run sampling with REINVENT4, and I've been encountering some issues. To check to see if it was an issue with my own molecules, I used the sample scaffolds.smi file that is available:
(reinvent4) [hpc117@cottontail2 it1_rev]$ cat /zfshomes/hpc117/REINVENT4/configs/toml/custom_scaffold_sampling.toml
run_type = "sampling"
device = "cpu" # Use CPU for molecule generation
[parameters]
model_file = "/zfshomes/hpc117/REINVENT4/priors/libinvent.prior" # LibInvent model file
smiles_file = "/zfshomes/hpc117/REINVENT4/configs/toml/scaffolds.smi" # Input: scaffolds with attachment points
output_file = "/zfshomes/hpc117/REINVENT4/output/decorated_scaffolds.smi" # Output: decorated molecules
num_smiles = 100 # Number of molecules to generate per scaffold
unique_molecules = true # Ensure generated molecules are unique
randomize_smiles = true # Randomize SMILES for diversity
and have been receiving the same errors: firstly, an issue with the keep_all = True line in
22:38:39 Sampling 100 SMILES from model /zfshomes/hpc117/REINVENT4/priors/libinvent.prior
Traceback (most recent call last):
File "/zfshomes/hpc117/.conda/envs/reinvent4/bin/reinvent", line 8, in
sys.exit(main())
File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 334, in main
runner(
File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 96, in run_sampling
sampled.smilies = normalize(sampled.smilies,keep_all=True)
TypeError: normalize() got an unexpected keyword argument 'keep_all'
then I remove that keep_all arg, because it seemed like a simple fix. and then get the following error:
22:41:05 Using generator Libinvent
22:41:05 Writing sampled SMILES to CSV file /zfshomes/hpc117/REINVENT4/output/decorated_scaffolds.smi
22:41:05 Sampling 100 SMILES from model /zfshomes/hpc117/REINVENT4/priors/libinvent.prior
22:41:06 reinvent_plugins.normalizers.rdkit_smiles: C1(O)CCCN(CCCOc2c3c(nc4cc(OC)ccc41)CCSS2)C|C could not be converted
Traceback (most recent call last):
File "/zfshomes/hpc117/.conda/envs/reinvent4/bin/reinvent", line 8, in
sys.exit(main())
File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 334, in main
runner(
File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 118, in run_sampling
sampled = filter_valid(sampled)
File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 162, in filter_valid
smilies = list(np.array(sampled.smilies)[mask_idx])
IndexError: boolean index did not match indexed array along dimension 0; dimension is 99 but corresponding boolean dimension is 100
scaffolds.smi ONLY has:
[:1]Cc2ccc1cncc(C[:2])c1c2
im confused where the cleaned/mormalized version is getting |*C
Apologies for the long post. But im confused and not good enough at coding to figure this one out haha!
Hello!
I've been trying to run sampling with REINVENT4, and I've been encountering some issues. To check to see if it was an issue with my own molecules, I used the sample scaffolds.smi file that is available:
(reinvent4) [hpc117@cottontail2 it1_rev]$ cat /zfshomes/hpc117/REINVENT4/configs/toml/custom_scaffold_sampling.toml run_type = "sampling" device = "cpu" # Use CPU for molecule generation
[parameters] model_file = "/zfshomes/hpc117/REINVENT4/priors/libinvent.prior" # LibInvent model file smiles_file = "/zfshomes/hpc117/REINVENT4/configs/toml/scaffolds.smi" # Input: scaffolds with attachment points output_file = "/zfshomes/hpc117/REINVENT4/output/decorated_scaffolds.smi" # Output: decorated molecules num_smiles = 100 # Number of molecules to generate per scaffold unique_molecules = true # Ensure generated molecules are unique randomize_smiles = true # Randomize SMILES for diversity
and have been receiving the same errors: firstly, an issue with the keep_all = True line in
22:38:39 Sampling 100 SMILES from model /zfshomes/hpc117/REINVENT4/priors/libinvent.prior
Traceback (most recent call last):
File "/zfshomes/hpc117/.conda/envs/reinvent4/bin/reinvent", line 8, in
sys.exit(main())
File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 334, in main
runner(
File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 96, in run_sampling
sampled.smilies = normalize(sampled.smilies,keep_all=True)
TypeError: normalize() got an unexpected keyword argument 'keep_all'
then I remove that keep_all arg, because it seemed like a simple fix. and then get the following error:
22:41:05 Using generator Libinvent
22:41:05 Writing sampled SMILES to CSV file /zfshomes/hpc117/REINVENT4/output/decorated_scaffolds.smi
22:41:05 Sampling 100 SMILES from model /zfshomes/hpc117/REINVENT4/priors/libinvent.prior
22:41:06 reinvent_plugins.normalizers.rdkit_smiles: C1(O)CCCN(CCCOc2c3c(nc4cc(OC)ccc41)CCSS2)C|C could not be converted
Traceback (most recent call last):
File "/zfshomes/hpc117/.conda/envs/reinvent4/bin/reinvent", line 8, in
sys.exit(main())
File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 334, in main
runner(
File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 118, in run_sampling
sampled = filter_valid(sampled)
File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 162, in filter_valid
smilies = list(np.array(sampled.smilies)[mask_idx])
IndexError: boolean index did not match indexed array along dimension 0; dimension is 99 but corresponding boolean dimension is 100
scaffolds.smi ONLY has:
[:1]Cc2ccc1cncc(C[:2])c1c2
im confused where the cleaned/mormalized version is getting |*C
Apologies for the long post. But im confused and not good enough at coding to figure this one out haha!