Model cannot converge - Githubissues

THUNLP-MT / Mask-Align

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

BSD 3-Clause "New" or "Revised" License

60 stars 20 forks source link

I try to train a mask_align model with default config in the repo (only change data paths) and DE-EN training data from https://github.com/lilt/alignment-scripts. In some of training steps the losses are nan and at end of training the loss increases from about 7 to 70.

epoch = 5, step = 49980, loss: nan, f_loss: nan, b_loss: nan, agree_loss: nan, entropy_loss: nan (0.246 sec) epoch = 5, step = 49990, loss: 64.210, f_loss: 67.750, b_loss: 60.188, agree_loss: 0.000, entropy_loss: 0.241 (0.507 sec) epoch = 5, step = 50000, loss: 69.115, f_loss: 72.500, b_loss: 65.312, agree_loss: 0.000, entropy_loss: 0.240 (0.652 sec)

THUNLP-MT / Mask-Align

Model cannot converge #1