MolecularAI / REINVENT4

AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization.
Apache License 2.0
353 stars 88 forks source link

Reinvent4 #3

Closed Rinkumc closed 11 months ago

Rinkumc commented 12 months ago

Not able to run the Reinvent4 tool could you help me to execute it?

halx commented 12 months ago

Hi,

thank you very much for your interest in REINVENT.

I would need much more information to be able to help. Have you installed the software? On what platform? What does "not able to run" exactly mean e.g. is there an error message, does it start up but fails later, etc.?

Many thanks, Hannes.

Rinkumc commented 12 months ago

Thank you for the respond, I did installed the software but don't know how to use it like what command should I execute I changed the config file paths as mentioned and moved it in tests folder, by running "pytest tests" command error is occuring could you provide a tutorial or something that will help me to understand the Reinvent4 better. Thanks.

halx commented 12 months ago

Ok, let's take this slowly.

If you install the software as per instructions you should do at some point

conda activate reinvent4
reinvent --help

Does that work and do you get sensible output? You should get a short description including all the command line options that the tool supports.

You do not have to deal with pytest to run REINVENT or have to copy anything into the test directory. It is for developers, not for end-users.

There is no specific tutorial available at the moment but there are configuration input examples in config/toml with some instructions.

Rinkumc commented 12 months ago

Yes I did created the environment and further steps and got the command line options through help. Okay so except the pytest how to perform the running modes like sampling, scoring etc using Reinvent4's config files? Unable to generate the sampling.csv file. Could you provide us the command for that? I did go through the config folder but its bit confusing.

halx commented 12 months ago

Many thanks, this is useful information. It would help us to improve the documentation if you could tell us what specifically is confusing to you.

Principally, you have to run the tool from the command line as

reinvent -l sampling.log sampling.toml

If you do not want to write out the log to a file but rather to the screen you leave out the '-l sampling.log' part. You would copy the configuration file sampling.toml to a directory from where you wish to run your job. The file is located in config/toml.

Next, you start editing with your preferred editor (ideally one that supports TOML in terms of syntax and highlighting). There are some details in the various *.md files. In the TOML file you would need to point the 'model_file' parameter to the right prior. Those are in the directory you have downloaded from Github. Depending on the model you may also need a 'smiles_file'.

What prior you choose really depends on what you are trying to achieve and what problems you are trying to solve. This is of course a much wider question.

Rinkumc commented 12 months ago

Thank you so much for the help I will try the command you gave if any error occurs I will let you know.

Rinkumc commented 12 months ago

For sampling empty csv file is generating with the following error...

File "/home/rinku/miniconda3/envs/reinvent4/bin/reinvent", line 8, in sys.exit(main()) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 284, in main runner(input_config, actual_device, tb_logdir, responder_config) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 123, in run_sampling sampled = filter_valid(sampled) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 177, in filter_valid nlls = list(np.array(sampled.nlls)[mask_idx]) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/torch/_tensor.py", line 757, in array return self.numpy() RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

halx commented 12 months ago

In your Conda enviroment can you run the following please

conda list | grep torch

and tell me what your version number of torch is.

Rinkumc commented 12 months ago

torch 1.12.1+cu113 pypi_0 pypi torchvision 0.13.1+cu113 pypi_0 pypi

halx commented 12 months ago

Many thanks. Would you be able to share your config file with me?

halx commented 12 months ago

Are you running your job on the CPU? I see the same error when run on the CPU but not on the GPU.

Rinkumc commented 12 months ago

sampling.zip

Rinkumc commented 12 months ago

Are you running your job on the CPU? I see the same error when run on the CPU but not on the GPU.

No its on GPU

Rinkumc commented 12 months ago

Other running modes like scoring transfer learning, they are giving the output the only error is in the sampling. I even tried with other mode in sampling still the error is similar.

halx commented 12 months ago

Many thanks for all your help. We will be working on a fix and will let you know when done,.

halx commented 12 months ago

Just to be sure, can you provide me with the ouput of your log file (-l), please?

Rinkumc commented 12 months ago

Its running on the CPU I rechecked it.

Rinkumc commented 12 months ago

13:03:34 Started REINVENT 4.0.11 (C) AstraZeneca 2017, 2023 on 2023-11-16 13:03:34 Command line: /home/rinku/miniconda3/envs/reinvent4/bin/reinvent -l sampling.log sampling.toml 13:03:34 User rinku on host admin 13:03:34 Python version 3.10.13 13:03:34 PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8 13:03:34 PyTorch compiled with CUDA version 11.3 13:03:34 RDKit version 2022.09.5 13:03:34 Platform Linux-5.15.0-88-generic-x86_64-with-glibc2.35 13:03:34 Number of PyTorch CUDA devices 0 13:03:34 Using CPU x86_64 13:03:34 Writing JSON config file to /home/rinku/REINVENT4/configs/toml/_sampling.json 13:03:34 Starting Sampling 13:03:34 Using generator Reinvent 13:03:34 Writing sampled SMILES to CSV file sampling.csv 13:03:34 Sampling 157 SMILES from model /home/rinku/REINVENT4/priors/reinvent.prior 13:03:35 Time taken in seconds: 1

Output of log file

halx commented 12 months ago

"Using CPU x86_64" tells me that you are actually run on CPU. As you have set use_cuda = true this means that you have a problem with you CUDA setup. Do you have Nvidia hardward and if so does the tool nvidia-smi work for you.

halx commented 12 months ago

Ok, I realize that that is what you actually said. Nevertheless, we will be working on the CPU related bug.

halx commented 12 months ago

I think we have fixed this. Please git pull into your repository and let us know how it goes.

Rinkumc commented 12 months ago

Okay thank you so much for your time I will let you know by tomorrow.

On Thu, Nov 16, 2023, 11:11 PM Hannes Loeffler @.***> wrote:

I think we have fixed this. Please git pull into your repository and let us know how it goes.

— Reply to this email directly, view it on GitHub https://github.com/MolecularAI/REINVENT4/issues/3#issuecomment-1814929822, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA7XZG2GV7Z2SWCLEP3LH43YEZF5TAVCNFSM6AAAAAA7NSLTXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJUHEZDSOBSGI . You are receiving this because you authored the thread.Message ID: @.***>

Rinkumc commented 12 months ago

The sampling is excecuting perfectly also the rest of the modes except the transfer learning. Earlier the transfer learning mode was running now its showing the below error..

(reinvent4) rinku@admin:~/REINVENT4/configs/toml$ reinvent -l transfer_learning.log transfer_learning.toml Traceback (most recent call last): File "/home/rinku/miniconda3/envs/reinvent4/bin/reinvent", line 8, in sys.exit(main()) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 284, in main runner(input_config, actual_device, tb_logdir, responder_config) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/TL/run_transfer_learning.py", line 149, in run_transfer_learning runner = runner_class(adapter, tb_logdir, mode_config, logger_parameters) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/TL/learning.py", line 136, in init self.tb_reporter.add_histogram("Tanimoto input SMILES", np.array(sim), 0) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py", line 484, in add_histogram histogram(tag, values, bins, max_bins=max_bins), global_step, walltime File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/torch/utils/tensorboard/summary.py", line 352, in histogram hist = make_histogram(values.astype(float), bins, max_bins) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/torch/utils/tensorboard/summary.py", line 380, in make_histogram cum_counts = np.cumsum(np.greater(counts, 0, dtype=np.int32)) TypeError: No loop matching the specified signature and casting was found for ufunc greater

halx commented 12 months ago

I believe I have fixed that now too.

Rinkumc commented 12 months ago

Yes its running!!! thank you so much for your help.

Rinkumc commented 12 months ago

In transfer_leraning.toml file what input should I put in validation_compounds.smi file.

halx commented 12 months ago

This comes down to the basics of machine learning i.e. how to split a dataset into training set, test and/or validation set.

Having said that, you can also just leave it out The validation set is mostly a help to keep control over overfitting.

In practice, however, the problem is somewhat ill-defined: you want to use TL to make a model produce molecules more like the input set But what does "more like" mean in practice? You do not want to be to close to the original model and you do not want the model to reproduce just the molecules from the input data set.

Rinkumc commented 12 months ago

Okay i'll look for it. Thanks

Rinkumc commented 11 months ago

(reinvent4) rinku@admin:~/REINVENT4/configs/toml$ reinvent -l sampling.log sampling.toml Traceback (most recent call last): File "/home/rinku/miniconda3/envs/reinvent4/bin/reinvent", line 8, in sys.exit(main()) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 284, in main runner(input_config, actual_device, tb_logdir, responder_config) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/run_sampling.py", line 101, in run_sampling sampled = sampler.sample(input_smilies) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/samplers/mol2mol.py", line 50, in sample dataset = Dataset(smilies, self.model.get_vocabulary(), tokenizer) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/models/mol2mol/dataset/dataset.py", line 25, in init enc = self._vocabulary.encode(tokenized) File "/home/rinku/miniconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/models/mol2mol/models/vocabulary.py", line 60, in encode ohe_vect[i] = self._tokens[token] KeyError: '[S@+]' Again for sampling this error is occuring

halx commented 11 months ago

Would you please open a separate issue for this? Would you also be able to share TOML config and input SMILES file?