Closed harpap closed 2 years ago
It seems that it is the problem with CUDA or pytorch version. Can you successfully run this in python:
import torch
torch.zeros(1).cuda()
I aware that the pytorch cuda version (10.0) is not match your CUDA version (11.4) in your enverioment.
pytorch pkgs/main/linux-64::pytorch-1.3.1-cuda100py36h53c1284_0
Maybe your CUDA version is too high, you may try to use a lower CUDA version or higher pytorch version (pytorch 1.7 is OK for running the code):
Hi @wangxinyu0922 ! thanks for the help.
This command: torch.zeros(1).cuda()
runs but very slowly.
I created new env with torch1.7 and python 3.9.7 and it installed:
_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
_openmp_mutex pkgs/main/linux-64::_openmp_mutex-4.5-1_gnu
_pytorch_select pkgs/main/linux-64::_pytorch_select-0.1-cpu_0
blas pkgs/main/linux-64::blas-1.0-mkl
ca-certificates pkgs/main/linux-64::ca-certificates-2021.10.26-h06a4308_2
certifi pkgs/main/linux-64::certifi-2021.10.8-py39h06a4308_0
cffi pkgs/main/linux-64::cffi-1.14.6-py39h400218f_0
cudatoolkit pkgs/main/linux-64::cudatoolkit-11.3.1-h2bc3f7f_2
cudnn pkgs/main/linux-64::cudnn-8.2.1-cuda11.3_0
intel-openmp pkgs/main/linux-64::intel-openmp-2019.4-243
ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.35.1-h7274673_9
libffi pkgs/main/linux-64::libffi-3.3-he6710b0_2
libgcc-ng pkgs/main/linux-64::libgcc-ng-9.3.0-h5101ec6_17
libgomp pkgs/main/linux-64::libgomp-9.3.0-h5101ec6_17
libmklml pkgs/main/linux-64::libmklml-2019.0.5-0
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-9.3.0-hd4cf53a_17
mkl pkgs/main/linux-64::mkl-2020.2-256
mkl-service pkgs/main/linux-64::mkl-service-2.3.0-py39he8ac12f_0
mkl_fft pkgs/main/linux-64::mkl_fft-1.3.0-py39h54f3939_0
mkl_random pkgs/main/linux-64::mkl_random-1.0.2-py39h63df603_0
ncurses pkgs/main/linux-64::ncurses-6.3-h7f8727e_2
ninja pkgs/main/linux-64::ninja-1.10.2-py39hd09550d_3
numpy pkgs/main/linux-64::numpy-1.19.2-py39h89c1606_0
numpy-base pkgs/main/linux-64::numpy-base-1.19.2-py39h2ae0177_0
openssl pkgs/main/linux-64::openssl-1.1.1l-h7f8727e_0
pip pkgs/main/linux-64::pip-21.2.4-py39h06a4308_0
pycparser pkgs/main/noarch::pycparser-2.21-pyhd3eb1b0_0
python pkgs/main/linux-64::python-3.9.7-h12debd9_1
pytorch pkgs/main/linux-64::pytorch-1.7.1-cpu_py39h6a09485_0
readline pkgs/main/linux-64::readline-8.1-h27cfd23_0
setuptools pkgs/main/linux-64::setuptools-58.0.4-py39h06a4308_0
six pkgs/main/noarch::six-1.16.0-pyhd3eb1b0_0
sqlite pkgs/main/linux-64::sqlite-3.36.0-hc218d9a_0
tk pkgs/main/linux-64::tk-8.6.11-h1ccaba5_0
typing-extensions pkgs/main/noarch::typing-extensions-3.10.0.2-hd3eb1b0_0
typing_extensions pkgs/main/noarch::typing_extensions-3.10.0.2-pyh06a4308_0
tzdata pkgs/main/noarch::tzdata-2021e-hda174b7_0
wheel pkgs/main/noarch::wheel-0.37.0-pyhd3eb1b0_1
xz pkgs/main/linux-64::xz-5.2.5-h7b6447c_0
zlib pkgs/main/linux-64::zlib-1.2.11-h7f8727e_4
But in this env it was impossible to install the requirements.txt (it throws lots of errors). If you could tell me the versions it would really help. I paste the requirements.txt that I tried:
allennlp==0.9.0
boto3==1.10.45
botocore==1.13.45
bpemb==0.3.0
certifi==2020.4.5.1
conllu==1.3.1
cycler==0.10.0
Deprecated==1.2.6
gensim==3.8.1
h5py==2.8.0
ipython==7.12.0
matplotlib==3.1.3
mock==4.0.1
numpy
overrides==2.8.0
Pillow==7.0.0
pyhocon==0.3.56
pytest==6.1.2
pytorch-transformers==1.1.0
pyyaml==5.2
regex==2019.12.20
requests==2.22.0
scipy==1.4.1
segtok==1.5.7
sklearn==0.0
spacy
tabulate==0.8.6
torch
tqdm==4.41.0
transformers==3.0.0
You may see this issue
My conda env:
I later run
pip install -r requirements.txt
which throws an error and also installs the following:Then when I run:
CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --test
throws this error:It runs on an nvidia 3090 and I have updated all drivers:
NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4