after finetuning mt5 by transformers，generate effect become strange

google-research / multilingual-t5

Apache License 2.0

1.25k stars 129 forks source link

after finetuning mt5 by transformers，generate effect become strange #45

Closed yayaQAQ closed 3 years ago

yayaQAQ commented 3 years ago

I try to do translation task by mt5, exactly English to Chinese. When I finetuning many steps on millions of parallel corporas, loss was down to 2.407 and stayed for a while. I try use the model after finetuning. loss pic:

My input is English sentence, output is Chinese sentence. Before the generating Chinese sentence , it must generate '39'. I generated 30 sentences. There was 29 sentences started with '39'. Of course, there is no sentence beginning with 39 in the English input. The generating effect is bad.

up is original,down is generated

I find my Chinese corpus, I only find one sentence which start with '39'. What caused this problem? Should I use mt5 to do translation task?There is any good solution for translation task? Thanks!

craffel commented 3 years ago

We have not experimented with using mT5 for translation. We use a constant learning rate during fine-tuning, but otherwise your plots look fine. Is the one sentence that starts with 39. repeated many times over the course of training, or something? Otherwise, I don't know what the issue is since I don't have access to your data, and this is not a use-case of mT5 that we intend to support.

yayaQAQ commented 3 years ago

We have not experimented with using mT5 for translation. We use a constant learning rate during fine-tuning, but otherwise your plots look fine. Is the one sentence that starts with 39. repeated many times over the course of training, or something? Otherwise, I don't know what the issue is since I don't have access to your data, and this is not a use-case of mT5 that we intend to support.

@craffel There is only one sentence beginning with '39'. But generating results are all start with '39'. And what is the value of the constant learning rate? Maybe I can try it on my task. Thanks!

craffel commented 3 years ago

1e-3, see the paper:

yayaQAQ commented 3 years ago

@craffel Thanks!