Closed uskudarli closed 5 years ago
@ugurcanarikan -- this task is in progress, right? It seems to be sitting in "to do" list. I am trying to track the work being done, and what is in the list. I think things are not quite being recorded yet. Right?
gotcha. Thanks.
Pretraining with a batch size = 32 and train steps = 20 has been completed for trial. Now pretraining with batch size = 1024 and train steps = 10000 is in progress.
@ugurcanarikan
You are working on the case of steps = 2.5 million now, right?
What is the status?
In order to pretrain BERT for 10 epochs with batch size of 56, we had calculated the number of training steps to be 26.5 million. But as it would take around 80 days to complete 26.5 million steps with our RTX2080, we had decided to pretrain BERT for at least 2.6 million steps which would make 1 epoch and continue pretraining later. Currently, pretraining is at the step 3.18 million. After flair is trained with glove and Turkish fasttext embeddings, I will pause pretraining and extract BERT embedding as well to use it in training flair.
Use the Turkish corpus provided by Onur using pytorch-pretrained-BERT.