facebookresearch / XLM

PyTorch original implementation of Cross-lingual Language Model Pretraining.
Other
2.87k stars 495 forks source link

question about bptt #285

Open 15091444119 opened 4 years ago

15091444119 commented 4 years ago

bptt concat different input sentences together to make a training instance of length bptt; one input sentence could be split into to different training instances; how will this effect training ?

what will the monolingual and bilingual mask lm loss be when training converged?

thanks