THUNLP-MT / Mask-Align

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021
BSD 3-Clause "New" or "Revised" License
60 stars 20 forks source link

Model cannot converge #1

Open theoqian opened 3 years ago

theoqian commented 3 years ago

I try to train a mask_align model with default config in the repo (only change data paths) and DE-EN training data from https://github.com/lilt/alignment-scripts. In some of training steps the losses are nan and at end of training the loss increases from about 7 to 70.

epoch = 5, step = 49980, loss: nan, f_loss: nan, b_loss: nan, agree_loss: nan, entropy_loss: nan (0.246 sec) epoch = 5, step = 49990, loss: 64.210, f_loss: 67.750, b_loss: 60.188, agree_loss: 0.000, entropy_loss: 0.241 (0.507 sec) epoch = 5, step = 50000, loss: 69.115, f_loss: 72.500, b_loss: 65.312, agree_loss: 0.000, entropy_loss: 0.240 (0.652 sec)

carboncoo commented 3 years ago

Hi, this is most likely due to the presence of sentence pairs of length 1 in the training data. Our masking strategy does not allow this to happen, so we filter them out. You can use thualign/scripts/remove_single.py to filter the corpus and try training again.