RuntimeError: CUDA error: no kernel image is available for execution on the device

Hi all,

I tried to run the code by using three different setups, but I always get the same error:

Traceback (most recent call last):
File "train_task.py", line 34, in <module>
  from volta.task_utils import LoadDataset, LoadLoss, ForwardModelsTrain, ForwardModelsVal
File "/data/volta/task_utils.py", line 19, in <module>
  from volta.datasets import DatasetMapTrain, DatasetMapEval
File "/data/volta/datasets/__init__.py", line 23, in <module>
  from .SVO_Probes_dataset import SVO_ProbesClassificationDataset
File "/data/volta/datasets/SVO_Probes_dataset.py", line 20, in <module>
  p = Pipeline(lang='english', gpu = True, cache_dir = './cache')
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/trankit/pipeline.py", line 85, in __init__
  self._embedding_layers.half()
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 757, in half
  return self._apply(lambda t: t.half() if t.is_floating_point() else t)
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
  module._apply(fn)
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
  module._apply(fn)
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
  module._apply(fn)
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 593, in _apply
  param_applied = fn(param)
File "/root/anaconda3/envs/volta/lib/python3.6/site-packages/torch/nn/modules/module.py", line 757, in <lambda>
  return self._apply(lambda t: t.half() if t.is_floating_point() else t)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I followed the Repository setup steps in the README file (plus, after the setup I also needed to install nltk through anaconda and trankit through pip). These setups were:

Run it on a VM with Ubuntu 22.04, NVIDIA RTX 3090, CUDA 12.1 and NVIDIA driver version 530.30.02
Run it on the same virtual machine, but inside a Docker container nvidia/cuda:10.1-devel-ubuntu18.04
Run it on the same virtual machine, but inside a Docker container pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel

These are the exact commands I executed on both the VM and inside the Docker containers:

conda create -n volta python=3.6
conda activate volta
pip install -r requirements.txt
conda install pytorch=1.4.0 torchvision cudatoolkit=10.1 -c pytorch  #Remove torchvision version as 0.5 is not available

apt install git
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir ./

cd ..
cd tools/refer; make

cd ../..
python setup.py develop

conda install nltk
pip install trankit

Then, I ran the following command and after that I received the error above:

python train_task.py \
        --config_file config/vilbert_base.json --from_pretrained ctrl_vilbert.bin \
        --tasks_config_file config_tasks/vilbert_tasks.yml --task 20 \
        --adam_epsilon 1e-6 --adam_betas 0.9 0.999 --weight_decay 0.01 --warmup_proportion 0.1 --clip_grad_norm 0.0 \
        --logdir logs/SVO-Probes/ --task_specific_tokens

Any suggestions? Thanks!

e-bug / volta

RuntimeError: CUDA error: no kernel image is available for execution on the device #22