facebookresearch / SpanBERT

Code for using and evaluating SpanBERT.
Other
884 stars 174 forks source link

Question for training details about SpanBERT base model #6

Closed addf400 closed 5 years ago

addf400 commented 5 years ago

The base model has a very powerful performance ! What are the training steps / batch size / learning rate for the base model ? Is that all same with the large model ? Do you have any other corpus for training the base model except wiki or bookcorpus ?

omerlevy commented 5 years ago

We used exactly the same corpus.

The main difference is that we trained the base model with much bigger batch sizes (8x larger) for less iterations (300k instead of 2.4M, 8x less). We also had to change the learning rates accordingly.

When we tried to do the same for the large models, we ran into many stability issues. It is quite possible that some more optimization tricks could improve the large models as well.

addf400 commented 5 years ago

We used exactly the same corpus.

The main difference is that we trained the base model with much bigger batch sizes (8x larger) for less iterations (300k instead of 2.4M, 8x less). We also had to change the learning rates accordingly.

When we tried to do the same for the large models, we ran into many stability issues. It is quite possible that some more optimization tricks could improve the large models as well.

Can you tell me what is your learning rate in your base model training? It will save much resource and convenient to us to reproduction. Thanks you very much!

omerlevy commented 5 years ago

LR = 0.0005