Open lena-kru opened 2 years ago
@ldevyataykina read https://huggingface.co/blog/how-to-generate and try change num_beams, no_repeat_ngram_size and other parameters from article.
translation_pipeline = pipeline( "translation", model=model, tokenizer=tokenizer, src_lang="rus_Cyrl", tgt_lang="eng_Latn", max_length=512, num_beams=5, )
for sent in sent_tokenizer.tokenize(text): print(sent, ' ---> ', translation_pipeline(sent)[0]['translation_text'])
Output:
самописное по добрый день, просьба добавить в исключение файл (прикреплен). ---> self-published on good day, please add the file to the exclusion (attached). возможности изменить самописное по нет. ---> I don't have the ability to change the self-publishing.
P.S. Don't set max_length more than 512 tokens.
I try to translate some texts and sometimes I get really unexpected things.
For example, I try to translate that text
text = """ самописное по\nдобрый день, просьба добавить в исключение файл (прикреплен). возможности изменить самописное по нет.""" """
And it gives me
Code for reproducing it:
Python version: 3.8.13 transformers: 4.21.1