Training seems not no begin

ghpu commented 5 years ago

I am trying to reproduce the experiment, but it look as if the training process stay stuck at start :

2019-07-09 17:15:45,951 - INFO - allennlp.training.trainer - Beginning training. 2019-07-09 17:15:45,951 - INFO - allennlp.training.trainer - Epoch 0/79 2019-07-09 17:15:45,951 - INFO - allennlp.training.trainer - Peak CPU memory usage MB: 19202.28 2019-07-09 17:15:46,225 - INFO - allennlp.training.trainer - GPU 0 memory usage MB: 1694 2019-07-09 17:15:46,226 - INFO - allennlp.training.trainer - GPU 1 memory usage MB: 37 2019-07-09 17:15:46,231 - INFO - allennlp.training.trainer - Training 0%| | 0/46617 [00:00<?, ?it/s]

After a night, the progress bar has not moved at all.

Cpu usage is 100% for 1 core, memory use is slightly increasing, and gpus are not working.

Could you please indicate which versions of python, allennlp and pytorch you are using ?

Mine are python=3.6 allennlp==0.8.4 pytorch-pretrained-bert==0.6.1 pytorch=1.0.0

ghpu commented 5 years ago

When truncating train.conllu to a dozens of sentences, it works, so I presume a night of waiting was not enough for preprocessing the whole UD train corpus.

Hyperparticle commented 5 years ago

Yes, make sure to check your RAM usage. This is the stage where the training set is loaded into memory, and if you don't have enough, it may be waiting forever. I believe it's an issue with AllenNLP being very inefficient by creating several objects per line, but I haven't investigated far enough to verify exactly how it can be fixed.

Hyperparticle / udify

Training seems not no begin #1