Closed 008karan closed 5 years ago
Hi, did you have the answer for this question? I have the same question.
What I observed is that n-train_tokens parameter decides your number of batches only. If your train tokens are huge enough then put a smaller value here. Otherwise, it will take too long to train.
n_train_tokens is used only to determine how many gradient updates to take before stopping training (along with batch_size and n_epochs). Longer training is usually better if your dataset is large enough. I suggest training until your patience is exhausted if you have a very large dataset, otherwise monitor validation perplexity and stop when it starts to increase.
I am observing that when I put
n_train_tokens
a small value eg.400000
it gave around 8000 batches and for each 100 batch it was taking 200 sec. But when I put true value ofn_train_tokens
for my data set which is744566507
, number of batches are now 1450000 and taking nearly same time for each 100 batch. This will take huge time to train.So my question is whats the impact of
n_train_tokens
parameter here. What if I train by using any smaller value ofn_train_tokens
rather than going for true value? Will this affect the final weights of model?