argosopentech / argos-translate

Open-source offline translation library written in Python
https://www.argosopentech.com
MIT License
3.67k stars 268 forks source link

Doesn't understand Chinese #275

Open NRHGDW opened 2 years ago

NRHGDW commented 2 years ago

Hello. image

How are you? image

I'm too/very tired image

DSPerson commented 2 years ago

Yes, you are right. idiot

PJ-Finlay commented 2 years ago

Yes the Chinese translations aren't very good. I think the root cause is that there isn't very much data available for Chinese.

rafael3382 commented 1 year ago

Looks like there's much data for Chinese. https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/data/README-v2021-08-07.md

Can someone train an argos package with it please? I really need good Chinese to English translation

pierotofy commented 1 year ago

Maybe you can help us train a better Chinese model @rafael3382 see https://github.com/argosopentech/argos-train

PJ-Finlay commented 1 year ago

The Chinese model was updated recently hopefully the new one is better.

https://community.libretranslate.com/t/improving-chinese-translations/364/

If we can find more data we could retrain again too.

BackMountainDevil commented 1 year ago

still bad. How many GPU cards need if I want to train it?

mkunz7 commented 11 months ago

https://huggingface.co/Helsinki-NLP/opus-mt-zh-en does a pretty good job, I wonder if we can use that.


# pip install torch
# pip install sentencepiece
# pip install sacremoses

from transformers import MarianMTModel, MarianTokenizer

def chinese_to_english(text):
    model_name = 'Helsinki-NLP/opus-mt-zh-en'
    model = MarianMTModel.from_pretrained(model_name)
    tokenizer = MarianTokenizer.from_pretrained(model_name)

    # Tokenize the text
    tokenized_text = tokenizer.encode(text, return_tensors="pt")

    # Translate the tokenized text
    translated_tokens = model.generate(tokenized_text)

    # Decode the translated tokens to a string
    translated_text = tokenizer.decode(translated_tokens[0], skip_special_tokens=True)
    return translated_text

if __name__ == "__main__":
    chinese_text = input("Enter Chinese text: ")
    translated_text = chinese_to_english(chinese_text)
    print(f"Translated Text: {translated_text}")```
pierotofy commented 10 months ago

New Chinese simplified/traditional models (from OPUS-MT) are up:

https://libretranslate.com/?source=zh&target=en&q=%E4%BD%A0%E5%A5%BD

How do they score?

Link to models thread: https://community.libretranslate.com/t/opus-mt-language-models-port-thread/757/2

gkielian commented 10 months ago

Thanks @pierotofy! After reinstalling not only zh but also pl is now working great : )