TypeError in Reinforcement learning example

nizamibilal commented 9 months ago

Hi, I am trying to running reinforcement learning example using the provided toml config file.

Platform: Linux (ubuntu 22.04)
conda environment created using the requirement file (requirements-linux-64.lock)

Example:

reinvent -l transfer_learning.log transfer_learning.toml

Ouput:

2024-02-27 16:50:50.292312: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-27 16:50:50.322270: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-27 16:50:50.322315: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-27 16:50:50.323226: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-27 16:50:50.328169: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "/home/bilal.nizami/anaconda3/envs/reinvent4/bin/reinvent", line 8, in sys.exit(main()) File "/home/bilal.nizami/anaconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/Reinvent.py", line 292, in main runner(input_config, actual_device, tb_logdir, responder_config) File "/home/bilal.nizami/anaconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/TL/run_transfer_learning.py", line 144, in run_transfer_learning runner = runner_class(adapter, tb_logdir, mode_config, logger_parameters) File "/home/bilal.nizami/anaconda3/envs/reinvent4/lib/python3.10/site-packages/reinvent/runmodes/TL/learning.py", line 138, in init self.tb_reporter.add_histogram("Tanimoto input SMILES", np.array(sim), 0) File "/home/bilal.nizami/anaconda3/envs/reinvent4/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py", line 484, in add_histogram histogram(tag, values, bins, max_bins=max_bins), global_step, walltime File "/home/bilal.nizami/anaconda3/envs/reinvent4/lib/python3.10/site-packages/torch/utils/tensorboard/summary.py", line 352, in histogram hist = make_histogram(values.astype(float), bins, max_bins) File "/home/bilal.nizami/anaconda3/envs/reinvent4/lib/python3.10/site-packages/torch/utils/tensorboard/summary.py", line 380, in make_histogram cum_counts = np.cumsum(np.greater(counts, 0, dtype=np.int32)) TypeError: No loop matching the specified signature and casting was found for ufunc greater

output of log file:

16:50:41 Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-27 16:50:41 Command line: /home/bilal.nizami/anaconda3/envs/reinvent4/bin/reinvent -l transfer_learning.log transfer_learning.toml 16:50:41 User bilal.nizamimoa-technology.com on host mtdev-bilal 16:50:41 Python version 3.10.13 16:50:41 PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8 16:50:41 PyTorch compiled with CUDA version 11.3 16:50:41 RDKit version 2022.09.5 16:50:41 Platform Linux-5.15.0-97-generic-x86_64-with-glibc2.35 16:50:41 CUDA driver version 550.54.14 16:50:41 Number of PyTorch CUDA devices 1 16:50:41 Using CUDA device:0 NVIDIA L4 16:50:41 GPU memory: 22273 MiB free, 22478 MiB total 16:50:41 Writing TensorBoard summary to /home/bilal.nizami/GenerativeAI/Campestris_Data/Reinvent4_run/tb_TL 16:50:41 Writing JSON config file to /home/bilal.nizami/GenerativeAI/Campestris_Data/Reinvent4_run/json_transfer_learning.json 16:50:41 Starting Transfer Learning 16:50:43 Using generator Mol2Mol 16:50:43 Reading input SMILES from /home/bilal.nizami/GenerativeAI/Campestris_Data/Reinvent4_run/data/campestris_data_smiles_final.filtered.smi 16:50:46 Reading validation SMILES from /home/bilal.nizami/GenerativeAI/Campestris_Data/Reinvent4_run/data/campestris_data_smiles_final.filtered.smi 16:50:50 randomize_smiles set to false for Mol2Mol

additional information:

It might be related to an expired deprecation in pytorch as mentioned here. https://github.com/pytorch/pytorch/issues/91516

halx commented 9 months ago

Hi,

many thanks for your feedback and welcome to the community!

I see that you are using 4.0.35 and we should have fixed this issue in the latest release. So you would need to update to 4.1. Note that you may have to pip install xxhash in your environment. Unfortunately, there is no automatic way to get new Python packages installed.

Let me know should there be further issues, Hannes.

nizamibilal commented 9 months ago

Thank you! It resolved the issue.

MolecularAI / REINVENT4