Closed qwopqwop200 closed 9 months ago
prompt example: "Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"
Hi, I also find it strange when I reproduce them, but I directly follow their official infer commands and prompt template (same as your comment above), I also tried different `max_new_tokens' from default 20 to 150.
So I am also confused what's going with it, willing to communicate with you if any new findings. Thank you.
Thank you for your amazing work.
I have a question because ALMA's results are very poor in the paper. When I look at the results for WMT 20 en -> zh in Table 5, it shows a very low score (11.3) for ALMA, but this is very strange. ALMA gets a high score (39.3) on WMT 22 and even the WMT 20 test dataset used in the benchmark is used for training ALMA.
This is my personal guess, but it seems that the prompts template used for ALMA's inference were not used.