IBM / regression-transformer

Regression Transformer (2023; Nature Machine Intelligence)
https://www.nature.com/articles/s42256-023-00639-z
MIT License
135 stars 21 forks source link

Training and Running Inference Problems #20

Closed amelie-iska closed 5 months ago

amelie-iska commented 5 months ago

Assuming I have downloaded the box folder with the data, tokenizers, and trained_model folders, could you please provide an example of how to run evaluation on box/data/qed/chembl_selfies_eval.txt and how to run inference on a single selfies example properly? I am finding it quite difficult to understand the desired folder and file structure for the pretrained checkpoints when running the evaluation script.

For example:

(rt) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer$ python scripts/eval_language_modeling.py \
--output_dir ./box/trained_models/qed \
--eval_file ./box/data/qed/chembl_selfies_eval.txt \
--eval_accumulation_steps 2 \
--param_path configs/qed_eval.json
WARNING:terminator.utils:No checkpoints found that contain  in ./box/trained_models/qed.       
WARNING:terminator.utils:No checkpoints found that contain checkpoint in ./box/trained_models/qed.
Traceback (most recent call last):
  File "scripts/eval_language_modeling.py", line 276, in <module>
    main()
  File "scripts/eval_language_modeling.py", line 68, in main
    model_dir, must_contain=eval_params.get("checkpoint-str", "best")
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer/terminator/utils.py", line 73, in get_latest_checkpoint
    return get_latest_checkpoint(model_path, must_contain=next_try)
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer/terminator/utils.py", line 73, in get_latest_checkpoint
    return get_latest_checkpoint(model_path, must_contain=next_try)
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer/terminator/utils.py", line 73, in get_latest_checkpoint
    return get_latest_checkpoint(model_path, must_contain=next_try)
  [Previous line repeated 983 more times]
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer/terminator/utils.py", line 69, in get_latest_checkpoint
    f"No checkpoints found that contain {must_contain} in {model_path}."
  File "/home/asr50/miniconda3/envs/rt/lib/python3.7/logging/__init__.py", line 1390, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "/home/asr50/miniconda3/envs/rt/lib/python3.7/logging/__init__.py", line 1514, in _log  
    self.handle(record)
  File "/home/asr50/miniconda3/envs/rt/lib/python3.7/logging/__init__.py", line 1524, in handle
    self.callHandlers(record)
  File "/home/asr50/miniconda3/envs/rt/lib/python3.7/logging/__init__.py", line 1586, in callHandlers
    hdlr.handle(record)
  File "/home/asr50/miniconda3/envs/rt/lib/python3.7/logging/__init__.py", line 894, in handle 
    self.emit(record)
  File "/home/asr50/miniconda3/envs/rt/lib/python3.7/logging/__init__.py", line 1025, in emit  
    msg = self.format(record)
  File "/home/asr50/miniconda3/envs/rt/lib/python3.7/logging/__init__.py", line 869, in format 
    return fmt.format(record)
  File "/home/asr50/miniconda3/envs/rt/lib/python3.7/logging/__init__.py", line 609, in format 
    if self.usesTime():
  File "/home/asr50/miniconda3/envs/rt/lib/python3.7/logging/__init__.py", line 577, in usesTime
    return self._style.usesTime()
  File "/home/asr50/miniconda3/envs/rt/lib/python3.7/logging/__init__.py", line 419, in usesTime
    return self._fmt.find(self.asctime_search) >= 0
RecursionError: maximum recursion depth exceeded while calling a Python object
(rt) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer$

Additionally I am finding training runs to be quite difficult as well, which would probably help with understanding how the checkpoint folders and files are supposed to be structured upon saving. Perhaps you could provide an example command for running a script to train a qed model as well?

I have tried running the following, but get errors about the tokenizer not having a particular attribute:

(rt) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer$ python scripts/run_language_modeling.py --output_dir ./new_trained_models \
    --config_name configs/rt_small.json --tokenizer_name ./vocabs/smallmolecules.txt \
    --do_train --do_eval --learning_rate 1e-4 --num_train_epochs 1 --save_total_limit 2 \
    --save_steps 500 --per_gpu_train_batch_size 16 --evaluate_during_training --eval_steps 5 \
    --eval_data_file ./box/data/qed/chembl_selfies_eval.txt --train_data_file ./box/data/qed/chembl_selfies_train.txt \
    --line_by_line --block_size 510 --seed 42 --logging_steps 100 --eval_accumulation_steps 2 \
    --training_config_path training_configs/qed_alternated_cc.json
PyTorch: setting up devices
WARNING:__main__:Process rank: -1, device: cpu, n_gpu: 0, distributed training: False
INFO:__main__:Training/evaluation parameters CustomTrainingArguments(output_dir='./new_trained_models', overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluate_during_training=True, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, per_gpu_train_batch_size=16, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=0.0001, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, warmup_steps=0, logging_dir='runs/Jan21_16-28-06_LZ16-ASR50-DSA', logging_first_step=False, logging_steps=100, save_steps=500, save_total_limit=2, no_cuda=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=5, past_index=-1, run_name=None, disable_tqdm=False, remove_unused_columns=True, eval_accumulation_steps=2, training_config_path='training_configs/qed_alternated_cc.json')
loading configuration file configs/rt_small.json
/home/asr50/miniconda3/envs/rt/lib/python3.7/site-packages/transformers/configuration_xlnet.py:211: FutureWarning: This config doesn't use attention memories, a core feature of XLNet. Consider setting `men_len` to a non-zero value, for example `xlnet = XLNetLMHeadModel.from_pretrained('xlnet-base-cased'', mem_len=1024)`, for accurate training performance as well as an order of magnitude faster inference. Starting from version 3.5.0, the default parameter will be 1024, following the implementation in https://arxiv.org/abs/1906.08237
  FutureWarning,
Model config XLNetConfig {
  "architectures": [
    "XLNetLMHeadModel"
  ],
  "attn_type": "bi",
  "bi_data": false,
  "bos_token_id": 14,
  "clamp_len": -1,
  "d_head": 16,
  "d_inner": 1024,
  "d_model": 256,
  "dropout": 0.2,
  "end_n_top": 5,
  "eos_token_id": 14,
  "ff_activation": "gelu",
  "initializer_range": 0.02,
  "language": "selfies",
  "layer_norm_eps": 1e-12,
  "mem_len": null,
  "model_type": "xlnet",
  "n_head": 16,
  "n_layer": 32,
  "numerical_encodings_dim": 16,
  "numerical_encodings_format": "sum",
  "numerical_encodings_type": "float",
  "pad_token_id": 0,
  "reuse_len": null,
  "same_length": false,
  "start_n_top": 5,
  "summary_activation": "tanh",
  "summary_last_dropout": 0.1,
  "summary_type": "last",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 250
    }
  },
  "untie_r": true,
  "use_numerical_encodings": true,
  "vmax": 1.0,
  "vocab_size": 507
}

Model name './vocabs/smallmolecules.txt' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-w-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, TurkuNLP/bert-base-finnish-cased-v1, TurkuNLP/bert-base-finnish-uncased-v1, wietsedv/bert-base-dutch-cased). Assuming './vocabs/smallmolecules.txt' is a path, a model identifier, or url to a directory containing tokenizer files.
Calling ExpressionBertTokenizer.from_pretrained() with the path to a single file or url is deprecated
loading file ./vocabs/smallmolecules.txt
INFO:__main__:Training new model from scratch
/home/asr50/miniconda3/envs/rt/lib/python3.7/site-packages/transformers/modeling_auto.py:732: FutureWarning: The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.
  FutureWarning,
INFO:__main__:PyTorch version: 1.13.1
/home/asr50/miniconda3/envs/rt/lib/python3.7/site-packages/transformers/tokenization_utils_base.py:1321: FutureWarning: The `max_len` attribute has been deprecated and will be removed in a future version, use `model_max_length` instead.
  FutureWarning,
Creating features from dataset file at ./box/data/qed/chembl_selfies_train.txt
Creating features from dataset file at ./box/data/qed/chembl_selfies_eval.txt
INFO:__main__:Dataset sizes 1395602, 1000.
INFO:__main__:Number of parameters 27508219 of type <class 'transformers.modeling_xlnet.XLNetLMHeadModel'>
INFO:__main__:Training with alternate tasks
/home/asr50/miniconda3/envs/rt/lib/python3.7/site-packages/transformers/trainer.py:247: FutureWarning: Passing `prediction_loss_only` as a keyword argument is deprecated and won't be possible in a future version. Use `args.prediction_loss_only` instead.
  FutureWarning,
You are instantiating a Trainer but Tensorboard is not installed. You should consider installing it.
You are instantiating a Trainer but W&B is not installed. To use wandb logging, run `pip install wandb; wandb login` see https://docs.wandb.com/huggingface.
INFO:terminator.trainer:Verbose evaluation True
INFO:terminator.trainer:Attempting to use numerical encodings.
Using deprecated `--per_gpu_train_batch_size` argument which will be removed in a future version. Using `--per_device_train_batch_size` is preferred.
Using deprecated `--per_gpu_train_batch_size` argument which will be removed in a future version. Using `--per_device_train_batch_size` is preferred.
Using deprecated `--per_gpu_train_batch_size` argument which will be removed in a future version. Using `--per_device_train_batch_size` is preferred.
INFO:terminator.trainer:***** Running training *****
INFO:terminator.trainer:Model device cpu
INFO:terminator.trainer:  Num examples = 1395602
INFO:terminator.trainer:  Num Epochs = 1
INFO:terminator.trainer:  Instantaneous batch size per device = 8
INFO:terminator.trainer:  Total train batch size (w. parallel, distributed & accumulation) = 16
INFO:terminator.trainer:  Gradient Accumulation steps = 1
INFO:terminator.trainer:  Total optimization steps = 87226
Epoch:   0%|                                                             | 0/1 [00:00<?, ?it/sWARNING:terminator.trainer:Loading alternative collator for evaluation.:20<114:40:45,  4.73s/it]
INFO:terminator.trainer:***** Running Evaluation *****
INFO:terminator.trainer:  Num examples = 1000
INFO:terminator.trainer:  Batch size = 8
Evaluation: 100%|████████████████████████████████████████████| 125/125 [01:09<00:00,  1.79it/s]
{'eval_loss': 4.566575050354004, 'epoch': 5.7322358012519205e-05, 'step': 5}
INFO:terminator.trainer:Evaluation {'eval_loss': 4.566575050354004, 'epoch': 5.7322358012519205e-05}
You are instantiating a Trainer but Tensorboard is not installed. You should consider installing it.
You are instantiating a Trainer but W&B is not installed. To use wandb logging, run `pip install wandb; wandb login` see https://docs.wandb.com/huggingface.
INFO:terminator.trainer:Verbose evaluation True
Traceback (most recent call last):
  File "scripts/run_language_modeling.py", line 361, in <module>
    main()
  File "scripts/run_language_modeling.py", line 330, in main
    trainer.train(model_path=model_path)
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer/terminator/trainer.py", line 1094, in train
    self.property_evaluate()
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer/terminator/trainer.py", line 1198, in property_evaluate
    property_collator, save_path=self.args.output_dir
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer/terminator/evaluator.py", line 181, in property_prediction
    self.tokenizer.decode = self.tokenizer.decode_internal
AttributeError: 'ExpressionBertTokenizer' object has no attribute 'decode_internal'
Epoch:   0%|                                                             | 0/1 [01:35<?, ?it/s]
Iteration:   0%|                                         | 4/87226 [01:35<576:30:18, 23.79s/it]
(rt) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/regression-transformer$
jannisborn commented 5 months ago

HI @amelie-iska,

thanks for your interest in the work. In general, this repo is not actively maintained in favor of the RT's availaibility in GT4SD. Therefore, unless you want to exactly reproduce experiments from the paper, we always recommend using GT4SD.

Please install GT4SD from source and then training a RT model can be done from CLI with gt4sd-trainer .... The GT4SD repo has several examples on this in the main README or the examples folder, as well as closed issues that are describing the procedure. Please note that GT4SD uses a slightly updated version of the RT's code which is available in this repo under the gt4sd branch.

Regarding the training code: Did you just run the example from the README here? If yes, it might indeed be a bug. But

== Regarding your current evaluation code: it is failing because the of the path to the checkpoint which either does not exist or is not explicit, so the model recursively tries to find it, but since it does not manage, it enters a recursive loop.

amelie-iska commented 5 months ago

I am consistently having issues setting up the conda environment for GT4SD. I've tried creating the environment directly from the gpu_conda.yml file but this does not work. I've tried installing the dependencies individually by hand starting with a python=3.8 conda environment, and installing pytorch using conda/mamba with:

mamba install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

After installing requirements.txt and gpu_requirements.txt there is always an issue with

pip install git+https://github.com/PaccMann/paccmann_generator@0.0.2

and

pip install git+https://github.com/PaccMann/paccmann_gp@0.1.2

in the version control system requirements file. When attempting to run gt4sd-inference --help the process is aborted and I get the following:

(gt4sd-py38b) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$ gt4sd-inference --help
2024-01-22 12:43:50.175484: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-22 12:43:50.268301: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-01-22 12:43:50.291889: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-22 12:43:50.743712: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-01-22 12:43:50.743864: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-01-22 12:43:50.743885: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/home/asr50/miniconda3/envs/gt4sd-py38b/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/asr50/miniconda3/envs/gt4sd-py38b/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN3c104impl8GPUTrace13gpuTraceStateE'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
Aborted
(gt4sd-py38b) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$
jannisborn commented 5 months ago

It's surprising that you cant manage to set up a GT4SD env. Which OS do you have? Be aware that M1 Macs are not supported: https://github.com/GT4SD/gt4sd-core/issues/200

Also, when creating the env:

git clone https://github.com/GT4SD/gt4sd-core.git
cd gt4sd-core/
conda env create -f conda_gpu.yml 
conda activate gt4sd

have you tried replacing pip install gt4sd with pip install -e ., i.e., the developer setup?

amelie-iska commented 5 months ago

Hmm, I'm getting new errors today. I am using WSL on a Windows machine. I was able to follow the instructions above just fine and get the GPU conda environment working. However, now I'm getting errors when running the following help command for inference:

(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$ gt4sd-inference --help
2024-01-23 13:05:32.154405: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-23 13:05:32.261593: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-01-23 13:05:32.282343: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-23 13:05:32.711339: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-01-23 13:05:32.711422: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-01-23 13:05:32.711429: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3106, in _dep_map
    return self.__dep_map
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2899, in __getattr__
    raise AttributeError(attr)
AttributeError: _DistInfoDistribution__dep_map

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/requirements.py", line 35, in __init__
    parsed = _parse_requirement(requirement_string)
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/_parser.py", line 64, in parse_requirement
    return _parse_requirement(Tokenizer(source, rules=DEFAULT_RULES))
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/_parser.py", line 82, in _parse_requirement
    url, specifier, marker = _parse_requirement_details(tokenizer)
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/_parser.py", line 120, in _parse_requirement_details
    specifier = _parse_specifier(tokenizer)
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/_parser.py", line 216, in _parse_specifier
    parsed_specifiers = _parse_version_many(tokenizer)
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/_parser.py", line 231, in _parse_version_many
    tokenizer.raise_syntax_error(
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/_tokenizer.py", line 165, in raise_syntax_error
    raise ParserSyntaxError(
pkg_resources.extern.packaging._tokenizer.ParserSyntaxError: .* suffix can only be used with `==` or `!=` operators
    torch (>=1.9.*)
           ~~~~~~^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/asr50/miniconda3/envs/gt4sd/bin/gt4sd-inference", line 33, in <module>
    sys.exit(load_entry_point('gt4sd', 'console_scripts', 'gt4sd-inference')())
  File "/home/asr50/miniconda3/envs/gt4sd/bin/gt4sd-inference", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core/src/gt4sd/cli/inference.py", line 35, in <module>
    from ..algorithms.registry import ApplicationsRegistry
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core/src/gt4sd/algorithms/__init__.py", line 28, in <module>
    from .conditional_generation.guacamol import (  # noqa: F401
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core/src/gt4sd/algorithms/conditional_generation/guacamol/__init__.py", line 26, in <module>
    from .core import (
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core/src/gt4sd/algorithms/conditional_generation/guacamol/core.py", line 30, in <module>
    from ....training_pipelines.core import TrainingPipelineArguments
  File "/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core/src/gt4sd/training_pipelines/__init__.py", line 32, in <module>
    from gt4sd_trainer.hf_pl.core import (
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/gt4sd_trainer/hf_pl/core.py", line 31, in <module>
    from pytorch_lightning import LightningDataModule, LightningModule
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 34, in <module>
    from pytorch_lightning.callbacks import Callback  # noqa: E402
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
    from pytorch_lightning.callbacks.callback import Callback
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pytorch_lightning/callbacks/callback.py", line 25, in <module>
    from pytorch_lightning.utilities.types import STEP_OUTPUT
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
    from pytorch_lightning.utilities.apply_func import move_data_to_device  # noqa: F401       
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 29, in <module>
    from pytorch_lightning.utilities.imports import _compare_version, _TORCHTEXT_LEGACY        
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pytorch_lightning/utilities/imports.py", line 22, in <module>
    import pkg_resources
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3325, in <module>
    def _initialize_master_working_set():
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3299, in _call_aside
    f(*args, **kwargs)
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3337, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 631, in _build_master
    ws.require(__requires__)
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 968, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 834, in resolve
    new_requirements = dist.requires(req.extras)[::-1]
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2820, in requires
    dm = self._dep_map
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3108, in _dep_map
    self.__dep_map = self._compute_dependencies()
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3118, in _compute_dependencies
    reqs.extend(parse_requirements(req))
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3171, in __init__
    super(Requirement, self).__init__(requirement_string)
  File "/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/requirements.py", line 37, in __init__
    raise InvalidRequirement(str(e)) from e
pkg_resources.extern.packaging.requirements.InvalidRequirement: .* suffix can only be used with `==` or `!=` operators
    torch (>=1.9.*)
           ~~~~~~^
(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$
jannisborn commented 5 months ago

Yes, this one I've seen before, it has to do with the installation in editable mode. If you go to gt4sd folder and do:

pip install .

then I'm pretty sure it works afterwards

amelie-iska commented 5 months ago
(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core/src/gt4sd$ pip install .
ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core/src/gt4sd$
jannisborn commented 5 months ago

cd ../.. && pip install .

amelie-iska commented 5 months ago

New error now.

(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$ gt4sd-inference --help
2024-01-23 13:28:36.894862: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.    
2024-01-23 13:28:36.999321: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-01-23 13:28:37.020478: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-23 13:28:37.372029: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-01-23 13:28:37.372109: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-01-23 13:28:37.372128: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
╭───────────────────────────── Traceback (most recent call last) ─────────────────────────────╮
│ /home/asr50/miniconda3/envs/gt4sd/bin/gt4sd-inference:5 in <module>                         │
│                                                                                             │
│   2 # -*- coding: utf-8 -*-                                                                 │
│   3 import re                                                                               │
│   4 import sys                                                                              │
│ ❱ 5 from gt4sd.cli.inference import main                                                    │
│   6 if __name__ == '__main__':                                                              │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                    │
│   8 │   sys.exit(main())                                                                    │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/gt4sd/cli/inference.py:35 in  │
│ <module>                                                                                    │
│                                                                                             │
│    32 from dataclasses import dataclass, field                                              │
│    33 from typing import Any, Dict, Iterable, Optional, cast                                │
│    34                                                                                       │
│ ❱  35 from ..algorithms.registry import ApplicationsRegistry                                │
│    36 from .algorithms import (                                                             │
│    37 │   AVAILABLE_ALGORITHMS,                                                             │
│    38 │   AVAILABLE_ALGORITHMS_CATEGORIES,                                                  │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/gt4sd/algorithms/__init__.py: │
│ 28 in <module>                                                                              │
│                                                                                             │
│   25 from ..extras import EXTRAS_ENABLED                                                    │
│   26                                                                                        │
│   27 # NOTE: here we import the applications to register them                               │
│ ❱ 28 from .conditional_generation.guacamol import (  # noqa: F401                           │
│   29 │   AaeGenerator,                                                                      │
│   30 │   GraphGAGenerator,                                                                  │
│   31 │   GraphMCTSGenerator,                                                                │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/gt4sd/algorithms/conditional_ │
│ generation/guacamol/__init__.py:26 in <module>                                              │
│                                                                                             │
│   23 #                                                                                      │
│   24 """GuacaMol initialization."""                                                         │
│   25                                                                                        │
│ ❱ 26 from .core import (                                                                    │
│   27 │   AaeGenerator,                                                                      │
│   28 │   GraphGAGenerator,                                                                  │
│   29 │   GraphMCTSGenerator,                                                                │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/gt4sd/algorithms/conditional_ │
│ generation/guacamol/core.py:30 in <module>                                                  │
│                                                                                             │
│    27                                                                                       │
│    28 from ....domains.materials import SMILES, MoleculeFormat, validate_molecules          │
│    29 from ....exceptions import InvalidItem                                                │
│ ❱  30 from ....training_pipelines.core import TrainingPipelineArguments                     │
│    31 from ....training_pipelines.guacamol_baselines.core import GuacaMolSavingArguments    │
│    32 from ....training_pipelines.moses.core import MosesSavingArguments                    │
│    33 from ...core import AlgorithmConfiguration, GeneratorAlgorithm                        │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/gt4sd/training_pipelines/__in │
│ it__.py:102 in <module>                                                                     │
│                                                                                             │
│    99 │   GranularSavingArguments,                                                          │
│   100 │   GranularTrainingPipeline,                                                         │
│   101 )                                                                                     │
│ ❱ 102 from .pytorch_lightning.molformer.core import (                                       │
│   103 │   MolformerDataArguments,                                                           │
│   104 │   MolformerModelArguments,                                                          │
│   105 │   MolformerSavingArguments,                                                         │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/gt4sd/training_pipelines/pyto │
│ rch_lightning/molformer/core.py:35 in <module>                                              │
│                                                                                             │
│    32 import torch as _torch                                                                │
│    33 import tensorflow as _tensorflow                                                      │
│    34 import importlib_resources                                                            │
│ ❱  35 from gt4sd_molformer.finetune.finetune_pubchem_light import (                         │
│    36 │   LightningModule as RegressionLightningModule,                                     │
│    37 )                                                                                     │
│    38 from gt4sd_molformer.finetune.finetune_pubchem_light import (                         │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/gt4sd_molformer/finetune/fine │
│ tune_pubchem_light.py:25 in <module>                                                        │
│                                                                                             │
│    22 from torch.utils.data import DataLoader                                               │
│    23                                                                                       │
│    24 from .ft_args import parse_args                                                       │
│ ❱  25 from .ft_rotate_attention.ft_rotate_builder import (                                  │
│    26 │   RotateEncoderBuilder as rotate_builder,                                           │
│    27 )                                                                                     │
│    28 from .ft_tokenizer.ft_tokenizer import MolTranBertTokenizer                           │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/gt4sd_molformer/finetune/ft_r │
│ otate_attention/ft_rotate_builder.py:1 in <module>                                          │
│                                                                                             │
│ ❱  1 from fast_transformers.builders.attention_builders import AttentionBuilder             │
│    2 from fast_transformers.builders.transformer_builders import (                          │
│    3 │   BaseTransformerEncoderBuilder,                                                     │
│    4 )                                                                                      │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/fast_transformers/builders/__ │
│ init__.py:42 in <module>                                                                    │
│                                                                                             │
│   39 # TODO: Should this behaviour change? Namely, should all attention                     │
│   40 #       implementations be imported in order to be useable? This also allows           │
│   41 #       using the library even partially built, for instance.                          │
│ ❱ 42 from ..attention import \                                                              │
│   43 │   FullAttention, \                                                                   │
│   44 │   LinearAttention, CausalLinearAttention, \                                          │
│   45 │   ClusteredAttention, ImprovedClusteredAttention, \                                  │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/fast_transformers/attention/_ │
│ _init__.py:13 in <module>                                                                   │
│                                                                                             │
│   10 from .attention_layer import AttentionLayer                                            │
│   11 from .full_attention import FullAttention                                              │
│   12 from .linear_attention import LinearAttention                                          │
│ ❱ 13 from .causal_linear_attention import CausalLinearAttention                             │
│   14 from .clustered_attention import ClusteredAttention                                    │
│   15 from .improved_clustered_attention import ImprovedClusteredAttention                   │
│   16 from .reformer_attention import ReformerAttention                                      │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/fast_transformers/attention/c │
│ ausal_linear_attention.py:15 in <module>                                                    │
│                                                                                             │
│    12 from ..attention_registry import AttentionRegistry, Optional, Callable, Int, \        │
│    13 │   EventDispatcherInstance                                                           │
│    14 from ..events import EventDispatcher                                                  │
│ ❱  15 from ..causal_product import causal_dot_product                                       │
│    16 from ..feature_maps import elu_feature_map                                            │
│    17                                                                                       │
│    18                                                                                       │
│                                                                                             │
│ /home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/fast_transformers/causal_prod │
│ uct/__init__.py:9 in <module>                                                               │
│                                                                                             │
│    6                                                                                        │
│    7 import torch                                                                           │
│    8                                                                                        │
│ ❱  9 from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \        │
│   10 │   causal_dot_backward as causal_dot_backward_cpu                                     │
│   11                                                                                        │
│   12 try:                                                                                   │
╰─────────────────────────────────────────────────────────────────────────────────────────────╯
ImportError: 
/home/asr50/miniconda3/envs/gt4sd/lib/python3.8/site-packages/fast_transformers/causal_product/
causal_product_cpu.cpython-38-x86_64-linux-gnu.so: undefined symbol:
_ZN8pybind116detail11type_casterIN2at6TensorEvE4loadENS_6handleEb
(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$
jannisborn commented 5 months ago

This is related to gt4sd-molformer, what do you get with:

pip freeze | grep fast pip freeze | grep mol ?

amelie-iska commented 5 months ago
(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$ pip freeze | grep fast
fastjsonschema==2.19.1
fastprogress==1.0.3
pytorch-fast-transformers==0.4.0
(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$ pip freeze | grep mol
gt4sd @ file:///mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core
gt4sd-molformer==0.1.3
guacamol==0.5.5
guacamol-baselines @ git+https://github.com/GT4SD/guacamol_baselines.git@d99df1883ad980e78ffa9e97cafa3f6b28cc9ae7
molecule-generation @ git+https://github.com/GT4SD/molecule-generation@120a829056a0c44622a76683cace87053999b103
molgx==0.22.0a1
(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$
jannisborn commented 5 months ago

The part about the CPU is interesting. It's a problem in the interaction between fast-transformers and pytorch. What does this give you?

import torch
print(torch.__version__)
print(torch.cuda.is_available())
jannisborn commented 5 months ago

Oh I just read that you are using WSL. I have no experience with Windows/WSL, but I would be surprised if you can use your windows GPU directly through standard torch installation. The error above indicates a potential GPU/CPU problem

amelie-iska commented 5 months ago

Hmm, yeah. I've trained using the GPU and WSL before, so I know it's possible. I was even able to train a Regression Transformer at one point (albeit with errors). The checkpoints saved fine though. I just couldn't get inference to work properly, primarily due to not understanding what folder/file structure to provide when trying to run the script. But you are right, for some reason CUDA doesn't seem available in this environment.

(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$ py
thon
Python 3.8.18 | packaged by conda-forge | (default, Dec 23 2023, 17:21:28) 
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.12.1
>>> print(torch.cuda.is_available())
False
>>> exit()
(gt4sd) asr50@LZ16-ASR50-DSA:/mnt/e/users/asr50/vs_code_projects/small_molecules/gt4sd-core$