allenai / bilm-tf

Tensorflow implementation of contextualized word representations from bi-directional language models
Apache License 2.0
1.62k stars 452 forks source link

Training Issue in new corpus #165

Closed anupamjamatia closed 5 years ago

anupamjamatia commented 5 years ago

Hi, I am facing problem to train the ELMo in the new corpus I run the instructions given in the [https://github.com/allenai/bilm-tf] in my system which has a single GPU and getting the killed signal. Any solution for the problem?

(ELMO) anupam@anupam-OMEN-HP:~/Desktop/ElMo/bilm$ python bin/train_elmo.py --train_prefix='/home/anupam/Desktop/ElMo/bilm/training_file/*' --vocab_file en-bn-hi_mixed_voc_file.txt --save_dir out/ Found 1 shards at /home/anupam/Desktop/ElMo/bilm/training_file/* Loading data from: /home/anupam/Desktop/ElMo/bilm/training_file/Training_File.txt Loaded 667648 sentences. Finished loading Found 1 shards at /home/anupam/Desktop/ElMo/bilm/training_file/* Loading data from: /home/anupam/Desktop/ElMo/bilm/training_file/Training_File.txt Loaded 667648 sentences. Finished loading USING SKIP CONNECTIONS [['global_step:0', TensorShape([])], ['lm/CNN/W_cnn_0:0', TensorShape([Dimension(1), Dimension(1), Dimension(16), Dimension(32)])], ['lm/CNN/W_cnn_1:0', TensorShape([Dimension(1), Dimension(2), Dimension(16), Dimension(32)])], ['lm/CNN/W_cnn_2:0', TensorShape([Dimension(1), Dimension(3), Dimension(16), Dimension(64)])], ['lm/CNN/W_cnn_3:0', TensorShape([Dimension(1), Dimension(4), Dimension(16), Dimension(128)])], ['lm/CNN/W_cnn_4:0', TensorShape([Dimension(1), Dimension(5), Dimension(16), Dimension(256)])], ['lm/CNN/W_cnn_5:0', TensorShape([Dimension(1), Dimension(6), Dimension(16), Dimension(512)])], ['lm/CNN/W_cnn_6:0', TensorShape([Dimension(1), Dimension(7), Dimension(16), Dimension(1024)])], ['lm/CNN/b_cnn_0:0', TensorShape([Dimension(32)])], ['lm/CNN/b_cnn_1:0', TensorShape([Dimension(32)])], ['lm/CNN/b_cnn_2:0', TensorShape([Dimension(64)])], ['lm/CNN/b_cnn_3:0', TensorShape([Dimension(128)])], ['lm/CNN/b_cnn_4:0', TensorShape([Dimension(256)])], ['lm/CNN/b_cnn_5:0', TensorShape([Dimension(512)])], ['lm/CNN/b_cnn_6:0', TensorShape([Dimension(1024)])], ['lm/CNN_high_0/W_carry:0', TensorShape([Dimension(2048), Dimension(2048)])], ['lm/CNN_high_0/W_transform:0', TensorShape([Dimension(2048), Dimension(2048)])], ['lm/CNN_high_0/b_carry:0', TensorShape([Dimension(2048)])], ['lm/CNN_high_0/b_transform:0', TensorShape([Dimension(2048)])], ['lm/CNN_high_1/W_carry:0', TensorShape([Dimension(2048), Dimension(2048)])], ['lm/CNN_high_1/W_transform:0', TensorShape([Dimension(2048), Dimension(2048)])], ['lm/CNN_high_1/b_carry:0', TensorShape([Dimension(2048)])], ['lm/CNN_high_1/b_transform:0', TensorShape([Dimension(2048)])], ['lm/CNN_proj/W_proj:0', TensorShape([Dimension(2048), Dimension(512)])], ['lm/CNN_proj/b_proj:0', TensorShape([Dimension(512)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0', TensorShape([Dimension(16384)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0', TensorShape([Dimension(1024), Dimension(16384)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_0/lstm_cell/projection/kernel:0', TensorShape([Dimension(4096), Dimension(512)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0', TensorShape([Dimension(16384)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0', TensorShape([Dimension(1024), Dimension(16384)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_1/lstm_cell/projection/kernel:0', TensorShape([Dimension(4096), Dimension(512)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0', TensorShape([Dimension(16384)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0', TensorShape([Dimension(1024), Dimension(16384)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_0/lstm_cell/projection/kernel:0', TensorShape([Dimension(4096), Dimension(512)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0', TensorShape([Dimension(16384)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0', TensorShape([Dimension(1024), Dimension(16384)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_1/lstm_cell/projection/kernel:0', TensorShape([Dimension(4096), Dimension(512)])], ['lm/char_embed:0', TensorShape([Dimension(261), Dimension(16)])], ['lm/softmax/W:0', TensorShape([Dimension(1525043), Dimension(512)])], ['lm/softmax/b:0', TensorShape([Dimension(1525043)])], ['train_perplexity:0', TensorShape([])]] WARNING:tensorflow:From /home/anupam/.local/lib/python3.5/site-packages/tensorflow/python/util/tf_should_use.py:170: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Usetf.global_variables_initializerinstead. 2019-02-08 17:07:46.692619: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2019-02-08 17:07:46.692636: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2019-02-08 17:07:46.692641: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2019-02-08 17:07:46.692645: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2019-02-08 17:07:46.692649: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. Killed (ELMO) anupam@anupam-OMEN-HP:~/Desktop/ElMo/bilm$ GPU can be viewd here

` anupam@anupam-OMEN-HP:~$ nvidia-smi Fri Feb 8 18:24:06 2019
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 107... Off | 00000000:01:00.0 On | N/A | | N/A 55C P3 23W / N/A | 966MiB / 8117MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1665 G /usr/lib/xorg/Xorg 837MiB | | 0 3429 G compiz 50MiB | | 0 4058 G ...quest-channel-token=8843362536060243018 55MiB | | 0 4726 G ...quest-channel-token=6712252360503423866 13MiB | | 0 7177 G /usr/lib/thunderbird/thunderbird 2MiB | | 0 9805 G ...-token=82F8B3D416718D1A9486CF4518D0A1FF 4MiB | +-----------------------------------------------------------------------------+ `

hichiaty commented 5 years ago

You are using CUDA 10.0 which means that you are using a tensorflow version != 1.2 which is only compatible with CUDA 8.0.

anupamjamatia commented 5 years ago

So is not it possible to run the code in my system which has the following configuration

$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 and

`$ pip list Package Version


absl-py 0.6.1
astor 0.7.1
backcall 0.1.0
backports.weakref 1.0rc1
bilm 0.1.post5 bleach 1.5.0
certifi 2018.11.29 cycler 0.10.0
decorator 4.3.0
entrypoints 0.2.3
enum34 1.1.6
gast 0.2.0
grpcio 1.17.1
h5py 2.9.0
html5lib 0.9999999 ipykernel 5.1.0
ipython 7.2.0
ipython-genutils 0.2.0
ipywidgets 7.4.2
jedi 0.13.2
Jinja2 2.10
jsonschema 2.6.0
jupyter 1.0.0
jupyter-client 5.2.4
jupyter-console 6.0.0
jupyter-core 4.4.0
Keras 2.2.4
Keras-Applications 1.0.7
keras-metrics 0.0.5
Keras-Preprocessing 1.0.5
kiwisolver 1.0.1
Markdown 2.2.0
MarkupSafe 1.1.0
matplotlib 3.0.2
mistune 0.8.4
mkl-fft 1.0.6
mkl-random 1.0.2
mock 2.0.0
nbconvert 5.3.1
nbformat 4.4.0
nltk 3.4
notebook 5.7.4
numpy 1.16.1
pandas 0.23.4
pandas-ml 0.5.0
pandocfilters 1.4.2
parso 0.3.1
pbr 5.1.1
pexpect 4.6.0
pickleshare 0.7.5
pip 19.0.1
prometheus-client 0.5.0
prompt-toolkit 2.0.7
protobuf 3.6.1
ptyprocess 0.6.0
pydot-ng 2.0.0
Pygments 2.3.1
pyparsing 2.3.1
Pyphen 0.9.5
python-dateutil 2.7.5
pytz 2018.7
PyYAML 3.13
pyzmq 17.1.2
qtconsole 4.4.3
scikit-learn 0.20.2
scipy 1.2.0
seaborn 0.9.0
Send2Trash 1.5.0
setuptools 40.6.3
singledispatch 3.4.0.3
six 1.12.0
sklearn 0.0
tensorboard 1.12.2
tensorflow 1.12.0
tensorflow-gpu 1.2.0
termcolor 1.1.0
terminado 0.8.1
testpath 0.4.2
textblob 0.15.2
Theano 1.0.3
tornado 5.1.1
traitlets 4.3.2
wcwidth 0.1.7
webencodings 0.5.1
Werkzeug 0.14.1
wheel 0.32.3
widgetsnbextension 3.4.2
You are using pip version 19.0.1, however version 19.0.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command.`