🐛 Bug

Hi, I tried to test nllb for translating some English sentences to Chinese, and all my sentences are less than 60 tokens. However, most of sentences which more than 30 tokens cannot be generated completely, only half or less part of them can be done.

I also tried the same code, but English to French, it works. All sentences can be generated completly.

I also setted min_length, but sometimes, if I got short sentence, the last part of sentence will be compeately generated. My code is here, please help:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained( r"nllb-200-distilled-600M", token=True, src_lang="eng_Latn" ) model = AutoModelForSeq2SeqLM.from_pretrained(r"nllb-200-distilled-600M", token=True)

input_path = r"eng_test_short.txt" output_path = "./nllb_chn.txt"

input_file = open(input_path,'r',encoding='utf-8')

with open(output_path,'w',encoding='utf-8')as f: for article in input_file: inputs = tokenizer(article, return_tensors="pt")

print(article)

    # print(inputs)
    translated_tokens = model.generate(
        # **inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("fra_Latn"), max_length=200
        **inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("zho_Hans"), max_length=512

    )
    print(tokenizer.convert_tokens_to_ids("zho_Hans"))

    output = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True,model_max_length=512)[0]

    print(output)
    f.writelines(output+'\n')

The output would be like: input: Politicians are loath to raise the tax even one penny when gas prices are high. output: 政客们不愿意在高昂的燃油价格时,

facebookresearch / fairseq

NLLB is unable to translate into a complete long sentence in Chinese. #5549

🐛 Bug

print(article)