MolecularAI / REINVENT4

AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization.
Apache License 2.0
364 stars 89 forks source link

Getting error with libinvent and sampling #162

Open LilMasala opened 2 days ago

LilMasala commented 2 days ago

Hello!

I've been trying to run sampling with REINVENT4, and I've been encountering some issues. To check to see if it was an issue with my own molecules, I used the sample scaffolds.smi file that is available:

(reinvent4) [hpc117@cottontail2 it1_rev]$ cat /zfshomes/hpc117/REINVENT4/configs/toml/custom_scaffold_sampling.toml run_type = "sampling" device = "cpu" # Use CPU for molecule generation

[parameters] model_file = "/zfshomes/hpc117/REINVENT4/priors/libinvent.prior" # LibInvent model file smiles_file = "/zfshomes/hpc117/REINVENT4/configs/toml/scaffolds.smi" # Input: scaffolds with attachment points output_file = "/zfshomes/hpc117/REINVENT4/output/decorated_scaffolds.smi" # Output: decorated molecules num_smiles = 100 # Number of molecules to generate per scaffold unique_molecules = true # Ensure generated molecules are unique randomize_smiles = true # Randomize SMILES for diversity

and have been receiving the same errors: firstly, an issue with the keep_all = True line in

22:38:39 Sampling 100 SMILES from model /zfshomes/hpc117/REINVENT4/priors/libinvent.prior Traceback (most recent call last): File "/zfshomes/hpc117/.conda/envs/reinvent4/bin/reinvent", line 8, in sys.exit(main()) File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 334, in main runner( File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 96, in run_sampling sampled.smilies = normalize(sampled.smilies,keep_all=True) TypeError: normalize() got an unexpected keyword argument 'keep_all'

then I remove that keep_all arg, because it seemed like a simple fix. and then get the following error:

22:41:05 Using generator Libinvent 22:41:05 Writing sampled SMILES to CSV file /zfshomes/hpc117/REINVENT4/output/decorated_scaffolds.smi 22:41:05 Sampling 100 SMILES from model /zfshomes/hpc117/REINVENT4/priors/libinvent.prior 22:41:06 reinvent_plugins.normalizers.rdkit_smiles: C1(O)CCCN(CCCOc2c3c(nc4cc(OC)ccc41)CCSS2)C|C could not be converted Traceback (most recent call last): File "/zfshomes/hpc117/.conda/envs/reinvent4/bin/reinvent", line 8, in sys.exit(main()) File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 334, in main runner( File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 118, in run_sampling sampled = filter_valid(sampled) File "/zfshomes/hpc117/.conda/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 162, in filter_valid smilies = list(np.array(sampled.smilies)[mask_idx]) IndexError: boolean index did not match indexed array along dimension 0; dimension is 99 but corresponding boolean dimension is 100

scaffolds.smi ONLY has:

[:1]Cc2ccc1cncc(C[:2])c1c2

im confused where the cleaned/mormalized version is getting |*C

Apologies for the long post. But im confused and not good enough at coding to figure this one out haha!

halx commented 1 day ago

Hi,

many thanks for your interest in REINVENT and welcome to the community!

See #160 for a related question.

Many thanks, Hannes