allenai / bilm-tf

Tensorflow implementation of contextualized word representations from bi-directional language models
Apache License 2.0
1.62k stars 452 forks source link

desired configuration to run this repo #162

Closed anupamjamatia closed 5 years ago

anupamjamatia commented 5 years ago

Hi , may i know what is the package requirements to run this repo? Tensorflow version? Markdown version ? Bazzel version ?

I have the

(Elmo) anupam@anupam-OMEN-HP:~/Desktop/ElMo/bilm-tf-master$ nvidia-smi Thu Jan 31 22:30:53 2019
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 107... Off | 00000000:01:00.0 On | N/A | | N/A 48C P8 9W / N/A | 629MiB / 8117MiB | 1% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1452 G /usr/lib/xorg/Xorg 304MiB | | 0 3082 G compiz 201MiB | | 0 4537 G ...quest-channel-token=1692889282734884045 83MiB | | 0 5506 G ...-token=8D068D7FE856CA4FB58A16255E8812AC 36MiB | +-----------------------------------------------------------------------------+

and also

`(Elmo) anupam@anupam-OMEN-HP:~/Desktop/ElMo/bilm-tf-master$ pip list Package Version


absl-py 0.6.1
alabaster 0.7.12
anaconda-client 1.7.2
anaconda-navigator 1.8.7
anaconda-project 0.8.2
asn1crypto 0.24.0
astor 0.7.1
astroid 2.1.0
astropy 3.1
atomicwrites 1.2.1
attrs 18.2.0
Babel 2.6.0
backcall 0.1.0
backports.os 0.1.1
backports.shutil-get-terminal-size 1.0.0
backports.weakref 1.0rc1
beautifulsoup4 4.6.3
bilm 0.1.post5 bitarray 0.8.3
bkcharts 0.2
blaze 0.11.3
bleach 1.5.0
bokeh 1.0.2
boto 2.49.0
Bottleneck 1.2.1
certifi 2018.11.29 cffi 1.11.5
chardet 3.0.4
Click 7.0
cloudpickle 0.6.1
clyent 1.2.2
colorama 0.4.1
conda 4.5.12
conda-build 3.17.6
conda-verify 3.1.1
contextlib2 0.5.5
cryptography 2.4.2
cycler 0.10.0
Cython 0.29.2
cytoolz 0.9.0.1
dask 1.0.0
datashape 0.5.4
decorator 4.3.0
defusedxml 0.5.0
distributed 1.25.1
docutils 0.14
entrypoints 0.2.3
et-xmlfile 1.0.1
fastcache 1.0.2
filelock 3.0.10
Flask 1.0.2
Flask-Cors 3.0.7
future 0.17.1
gast 0.2.0
gevent 1.3.7
glob2 0.6
gmpy2 2.0.8
greenlet 0.4.15
grpcio 1.17.1
h5py 2.8.0
heapdict 1.0.0
html5lib 0.9999999 idna 2.8
imageio 2.4.1
imagesize 1.1.0
importlib-metadata 0.6
ipykernel 5.1.0
ipython 7.2.0
ipython-genutils 0.2.0
ipywidgets 7.4.2
isort 4.3.4
itsdangerous 1.1.0
jdcal 1.4
jedi 0.13.2
jeepney 0.4
Jinja2 2.10
jsonschema 2.6.0
jupyter 1.0.0
jupyter-client 5.2.4
jupyter-console 6.0.0
jupyter-core 4.4.0
jupyterlab 0.35.3
jupyterlab-launcher 0.10.5
jupyterlab-server 0.2.0
Keras 2.2.4
Keras-Applications 1.0.6
Keras-Preprocessing 1.0.5
keyring 17.0.0
kiwisolver 1.0.1
lazy-object-proxy 1.3.1
libarchive-c 2.8
lief 0.9.0
llvmlite 0.26.0
locket 0.2.0
lxml 4.2.5
Markdown 2.2.0
MarkupSafe 1.1.0
matplotlib 3.0.2
mccabe 0.6.1
mistune 0.8.4
mkl-fft 1.0.6
mkl-random 1.0.2
mock 2.0.0
more-itertools 4.3.0
mpmath 1.1.0
msgpack 0.5.6
multipledispatch 0.6.0
navigator-updater 0.2.1
nbconvert 5.4.0
nbformat 4.4.0
networkx 2.2
nltk 3.4
nose 1.3.7
notebook 5.7.4
numba 0.41.0
numexpr 2.6.8
numpy 1.16.0
numpydoc 0.8.0
odo 0.5.1
olefile 0.46
openpyxl 2.5.12
packaging 18.0
pandas 0.23.4
pandocfilters 1.4.2
parso 0.3.1
partd 0.3.9
path.py 11.5.0
pathlib2 2.3.3
patsy 0.5.1
pbr 5.1.1
pep8 1.7.1
pexpect 4.6.0
pickleshare 0.7.5
Pillow 5.3.0
pip 19.0.1
pkginfo 1.4.2
pluggy 0.8.0
ply 3.11
prometheus-client 0.5.0
prompt-toolkit 2.0.7
protobuf 3.6.1
psutil 5.4.8
ptyprocess 0.6.0
py 1.7.0
pycodestyle 2.4.0
pycosat 0.6.3
pycparser 2.19
pycrypto 2.6.1
pycurl 7.43.0.2
pydot-ng 2.0.0
pyflakes 2.0.0
Pygments 2.3.1
pylint 2.2.2
pyodbc 4.0.25
pyOpenSSL 18.0.0
pyparsing 2.3.1
PySocks 1.6.8
pytest 4.0.2
pytest-arraydiff 0.3
pytest-astropy 0.5.0
pytest-doctestplus 0.2.0
pytest-openfiles 0.3.1
pytest-remotedata 0.3.1
python-dateutil 2.7.5
pytz 2018.7
PyWavelets 1.0.1
PyYAML 3.13
pyzmq 17.1.2
QtAwesome 0.5.3
qtconsole 4.4.3
QtPy 1.5.2
requests 2.21.0
rope 0.11.0
ruamel-yaml 0.15.46
scikit-image 0.14.1
scikit-learn 0.20.1
scipy 1.1.0
seaborn 0.9.0
SecretStorage 3.1.0
Send2Trash 1.5.0
setuptools 40.6.3
simplegeneric 0.8.1
singledispatch 3.4.0.3
six 1.12.0
snowballstemmer 1.2.1
sortedcollections 1.0.1
sortedcontainers 2.1.0
Sphinx 1.8.2
sphinxcontrib-websupport 1.1.0
spyder 3.3.2
spyder-kernels 0.3.0
SQLAlchemy 1.2.15
statsmodels 0.9.0
sympy 1.3
tables 3.4.4
tblib 1.3.2
tensorboard 1.12.2
tensorflow 1.12.0
tensorflow-gpu 1.2.0
termcolor 1.1.0
terminado 0.8.1
testpath 0.4.2
Theano 1.0.3
toolz 0.9.0
tornado 5.1.1
tqdm 4.28.1
traitlets 4.3.2
typed-ast 1.1.0
typing 3.6.4
unicodecsv 0.14.1
urllib3 1.24.1
wcwidth 0.1.7
webencodings 0.5.1
Werkzeug 0.14.1
wheel 0.32.3
widgetsnbextension 3.4.2
wrapt 1.10.11
wurlitzer 1.0.2
xlrd 1.2.0
XlsxWriter 1.1.2
xlwt 1.3.0
zict 0.1.3 `

But still, error exists

like

`====================================================================== ERROR: test_training (unittest.loader._FailedTest)

ImportError: Failed to import test module: test_training Traceback (most recent call last): File "/home/anupam/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in from tensorflow.python.pywrap_tensorflow_internal import * File "/home/anupam/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in _pywrap_tensorflow_internal = swig_import_helper() File "/home/anupam/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) File "/home/anupam/anaconda3/lib/python3.6/imp.py", line 243, in load_module return load_dynamic(name, filename, file) File "/home/anupam/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic return _load(spec) ImportError: libcudnn.so.5: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/anupam/anaconda3/lib/python3.6/unittest/loader.py", line 428, in _find_test_path module = self._get_module_from_name(name) File "/home/anupam/anaconda3/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name import(name) File "/home/anupam/Desktop/ElMo/bilm-tf-master/tests/test_training.py", line 8, in import tensorflow as tf File "/home/anupam/.local/lib/python3.6/site-packages/tensorflow/init.py", line 24, in from tensorflow.python import File "/home/anupam/.local/lib/python3.6/site-packages/tensorflow/python/init.py", line 49, in from tensorflow.python import pywrap_tensorflow File "/home/anupam/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in raise ImportError(msg) ImportError: Traceback (most recent call last): File "/home/anupam/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in from tensorflow.python.pywrap_tensorflow_internal import File "/home/anupam/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in _pywrap_tensorflow_internal = swig_import_helper() File "/home/anupam/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) File "/home/anupam/anaconda3/lib/python3.6/imp.py", line 243, in load_module return load_dynamic(name, filename, file) File "/home/anupam/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic return _load(spec) ImportError: libcudnn.so.5: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions. Include the entire stack trace above this error message when asking for help. `

matt-peters commented 5 years ago

You have two versions of tensorflow installed:

tensorflow 1.12.0
tensorflow-gpu 1.2.0

Uninstall both of them and then follow the steps in the README: https://github.com/allenai/bilm-tf#installing

anupamjamatia commented 5 years ago

Thanks Matt, it now works - But no wi am facing errors while trying to train a biLM on a new corpus.

I am getting the following errors

(ELMO) anupam@anupam-OMEN-HP:~/Desktop/ElMo/bilm$ python bin/train_elmo.py --train_prefix=Training_File.txt --vocab_file en-bn-hi_mixed_voc_file.txt --save_dir checkpoint_out Found 1 shards at Training_File.txt Loading data from: Training_File.txt Loaded 667648 sentences. Finished loading Found 1 shards at Training_File.txt Loading data from: Training_File.txt Loaded 667648 sentences. Finished loading USING SKIP CONNECTIONS USING SKIP CONNECTIONS USING SKIP CONNECTIONS [['global_step:0', TensorShape([])], ['lm/CNN/W_cnn_0:0', TensorShape([Dimension(1), Dimension(1), Dimension(16), Dimension(32)])], ['lm/CNN/W_cnn_1:0', TensorShape([Dimension(1), Dimension(2), Dimension(16), Dimension(32)])], ['lm/CNN/W_cnn_2:0', TensorShape([Dimension(1), Dimension(3), Dimension(16), Dimension(64)])], ['lm/CNN/W_cnn_3:0', TensorShape([Dimension(1), Dimension(4), Dimension(16), Dimension(128)])], ['lm/CNN/W_cnn_4:0', TensorShape([Dimension(1), Dimension(5), Dimension(16), Dimension(256)])], ['lm/CNN/W_cnn_5:0', TensorShape([Dimension(1), Dimension(6), Dimension(16), Dimension(512)])], ['lm/CNN/W_cnn_6:0', TensorShape([Dimension(1), Dimension(7), Dimension(16), Dimension(1024)])], ['lm/CNN/b_cnn_0:0', TensorShape([Dimension(32)])], ['lm/CNN/b_cnn_1:0', TensorShape([Dimension(32)])], ['lm/CNN/b_cnn_2:0', TensorShape([Dimension(64)])], ['lm/CNN/b_cnn_3:0', TensorShape([Dimension(128)])], ['lm/CNN/b_cnn_4:0', TensorShape([Dimension(256)])], ['lm/CNN/b_cnn_5:0', TensorShape([Dimension(512)])], ['lm/CNN/b_cnn_6:0', TensorShape([Dimension(1024)])], ['lm/CNN_high_0/W_carry:0', TensorShape([Dimension(2048), Dimension(2048)])], ['lm/CNN_high_0/W_transform:0', TensorShape([Dimension(2048), Dimension(2048)])], ['lm/CNN_high_0/b_carry:0', TensorShape([Dimension(2048)])], ['lm/CNN_high_0/b_transform:0', TensorShape([Dimension(2048)])], ['lm/CNN_high_1/W_carry:0', TensorShape([Dimension(2048), Dimension(2048)])], ['lm/CNN_high_1/W_transform:0', TensorShape([Dimension(2048), Dimension(2048)])], ['lm/CNN_high_1/b_carry:0', TensorShape([Dimension(2048)])], ['lm/CNN_high_1/b_transform:0', TensorShape([Dimension(2048)])], ['lm/CNN_proj/W_proj:0', TensorShape([Dimension(2048), Dimension(512)])], ['lm/CNN_proj/b_proj:0', TensorShape([Dimension(512)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0', TensorShape([Dimension(16384)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0', TensorShape([Dimension(1024), Dimension(16384)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_0/lstm_cell/projection/kernel:0', TensorShape([Dimension(4096), Dimension(512)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0', TensorShape([Dimension(16384)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0', TensorShape([Dimension(1024), Dimension(16384)])], ['lm/RNN_0/rnn/multi_rnn_cell/cell_1/lstm_cell/projection/kernel:0', TensorShape([Dimension(4096), Dimension(512)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0', TensorShape([Dimension(16384)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0', TensorShape([Dimension(1024), Dimension(16384)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_0/lstm_cell/projection/kernel:0', TensorShape([Dimension(4096), Dimension(512)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0', TensorShape([Dimension(16384)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0', TensorShape([Dimension(1024), Dimension(16384)])], ['lm/RNN_1/rnn/multi_rnn_cell/cell_1/lstm_cell/projection/kernel:0', TensorShape([Dimension(4096), Dimension(512)])], ['lm/char_embed:0', TensorShape([Dimension(261), Dimension(16)])], ['lm/softmax/W:0', TensorShape([Dimension(1525043), Dimension(512)])], ['lm/softmax/b:0', TensorShape([Dimension(1525043)])], ['train_perplexity:0', TensorShape([])]] WARNING:tensorflow:From /home/anupam/.local/lib/python3.5/site-packages/tensorflow/python/util/tf_should_use.py:170: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Usetf.global_variables_initializerinstead. 2019-02-01 00:35:40.577872: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2019-02-01 00:35:40.577927: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2019-02-01 00:35:40.577946: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2019-02-01 00:35:40.577962: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2019-02-01 00:35:40.577980: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. WARNING:tensorflow:Error encountered when serializing lstm_output_embeddings. Type is unsupported, or the types of the items don't match field type in CollectionDef. 'list' object has no attribute 'name' Training for 10 epochs and 1000840 batches Killed (ELMO) anupam@anupam-OMEN-HP:~/Desktop/ElMo/bilm$

Any Solution for it please

pzhang84 commented 2 years ago

@anupamjamatia Did you put all sentences into one training file? If so, your batch size is 1000840, which might be too big for your GPU. I suggest you allocate fewer sentences into one training file and see what happens. Let me know if this helps at all.