Open mo-shahab opened 11 months ago
This seems to be the case with all their models which originate from Tatoeba Challenge. Only the models which are included here seem to work using Hugging Face. Up until a month ago, I hadn't encountered such problems.
Probably that's why the translation time is too slow! The ru-en model must be one of the older models which are still working.
Hope that helps, but I haven't found any fixes!
Yes this narrows some things down, I am not really sure what Tatoeba challenge is though. Here in this thread the author of the thread explains the possible problem. Hope this may help you
yeah i solved the problem, it's mainly a problem in the sampling/decoding, the default sampling approach to all of the models is mainly greedy search, this article is very helpful and will help you learn more on how to sample/decode your generated text https://huggingface.co/blog/how-to-generate
tatoeba challenge models are trained on this data compilation: https://github.com/Helsinki-NLP/Tatoeba-Challenge/ For speed I recommend to use the native Marian-NMT models and not the pytorch versions from the transformer library. Alternatively, you can also convert to ctranslate2 for fast decoding.
Otherwise, is the output still broken when using the transfomer models? I think this has been fixed, hasn't it? Otherwise, it would be a question to ask at the huggingface repositories.
The hebrew to english model outputs really is nonsensical in a way.
Gen Gen Terrorism Terrorism Terrorism Terrorism Terrorism Terrorism Cookie Cookie discussions Cookie discussions Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions assembly assembly assembly assembly assembly assembly assembly assembly assembly assembly assembly assembly assembly assembly Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions
def translate_text_file(input_filename, output_filename):
Load tokenizer and model