I want to know how many epoch we need wen pretraining bert,but most of articals about bert just say how many step we need when pretraining?
In the artical --'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding' gives approximate number--40 epoch.But this is about words,not sentence, but when we pretraing each sample is about sentence.
Is epoch not important in Nlp ? In cv ,epoch is important.
I want to know how many epoch we need wen pretraining bert,but most of articals about bert just say how many step we need when pretraining? In the artical --'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding' gives approximate number--40 epoch.But this is about words,not sentence, but when we pretraing each sample is about sentence. Is epoch not important in Nlp ? In cv ,epoch is important.