Closed mchari closed 4 years ago
Why are the number of batches so large - for just 1200 sentences ?
A look at bilm/training.py answered my question. I had not set n_train_tokens and the default value was way too high(likely the number of tokens that were in the original training data)
To incrementally train Elmo, i have a training data of around 1200 sentences, all in one file. I am using num_epochs = 5 - all the other settings are the same as in options.json that I downloaded from bilm-tf git.
The process has been running for 24hrs on a single Pascal P6000 GPU. What is the expectation for runtime for such a training job ? I also see the message "Training for 5 epochs and 1501265 batches". The last batch processed was numbered 83300- which is just past the half-way mark. I don't see how many epochs have been processed.
Any ideas how I could speed up the training ?