allenai / bilm-tf

Tensorflow implementation of contextualized word representations from bi-directional language models
Apache License 2.0
1.62k stars 452 forks source link

[Question] significance of n_train_tokens parameter #191

Closed 008karan closed 5 years ago

008karan commented 5 years ago

I am observing that when I put n_train_tokens a small value eg.400000 it gave around 8000 batches and for each 100 batch it was taking 200 sec. But when I put true value of n_train_tokens for my data set which is 744566507, number of batches are now 1450000 and taking nearly same time for each 100 batch. This will take huge time to train.

So my question is whats the impact of n_train_tokens parameter here. What if I train by using any smaller value of n_train_tokens rather than going for true value? Will this affect the final weights of model?

MLjian commented 5 years ago

Hi, did you have the answer for this question? I have the same question.

008karan commented 5 years ago

What I observed is that n-train_tokens parameter decides your number of batches only. If your train tokens are huge enough then put a smaller value here. Otherwise, it will take too long to train.

matt-peters commented 5 years ago

n_train_tokens is used only to determine how many gradient updates to take before stopping training (along with batch_size and n_epochs). Longer training is usually better if your dataset is large enough. I suggest training until your patience is exhausted if you have a very large dataset, otherwise monitor validation perplexity and stop when it starts to increase.