NVIDIA / sentiment-discovery

Unsupervised Language Modeling at scale for robust sentiment classification
Other
1.06k stars 202 forks source link

Getting: RuntimeError: CUDA error: out of memory #42

Closed elixuy closed 5 years ago

elixuy commented 5 years ago

Hi I met the error below when I tried to run the script python classifier.py --load_model lang_model_transfer/sentiment/sst_clf.pt --data data/icbu/icbu_test_reviews.csv. configuring data generating csv at data/icbu/icbu_test_reviews.sentence.label.csv Creating mlstm Traceback (most recent call last): File "classifier.py", line 53, in <module> model.cuda() File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 258, in cuda return self._apply(lambda t: t.cuda(device)) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply module._apply(fn) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply module._apply(fn) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply module._apply(fn) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 191, in _apply param.data = fn(param.data) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 258, in <lambda> return self._apply(lambda t: t.cuda(device)) RuntimeError: CUDA error: out of memory

There are only two line of recode in file 'icbu_test_reviews.csv' Below is the result of nvidia-smi: $nvidia-smi Mon Sep 10 20:42:29 2018
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 396.26 Driver Version: 396.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M40 Off | 00000000:00:08.0 Off | Off | | N/A 35C P0 61W / 250W | 11558MiB / 12215MiB | 16% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla M40 Off | 00000000:00:09.0 Off | Off | | N/A 32C P0 63W / 250W | 337MiB / 12215MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 9815 C ...dmin/docker_ml/framework/bin/ml_service 3448MiB | | 0 20415 C ...dmin/docker_ml/framework/bin/ml_service 7949MiB | | 0 28935 C ...dmin/docker_ml/framework/bin/ml_service 148MiB | | 1 9815 C ...dmin/docker_ml/framework/bin/ml_service 108MiB | | 1 20415 C ...dmin/docker_ml/framework/bin/ml_service 108MiB | | 1 28935 C ...dmin/docker_ml/framework/bin/ml_service 108MiB | +-----------------------------------------------------------------------------+

elixuy commented 5 years ago

I fix this error by assigning my model to another gpu card