argosopentech / argos-translate

Open-source offline translation library written in Python
https://www.argosopentech.com
MIT License
3.86k stars 282 forks source link

Poor Chinese translation? #225

Open chaodreaming opened 2 years ago

chaodreaming commented 2 years ago

I seem to get some completely unrelated translation results. I don’t know if my use method is wrong or the model effect is poor.

dingedi commented 2 years ago

could you give us some examples ?

chaodreaming commented 2 years ago

Vehicle detection technology is of great significance for realizing automatic monitoring and AI-assisted driving systems. The state-of-the-art object detection method, namely, a class of YOLOv5, has often been used to detect vehicles. However, it suffers some challenges, such as a high computational load and undesirable detection rate. To address these issues, an improved lightweight YOLOv5 method is proposed for vehicle detection in this paper. In the presented method, C3Ghost and Ghost modules are introduced into the YOLOv5 neck network to reduce the floating-point operations (FLOPs) in the feature channel fusion process and enhance the feature expression performance. A convolutional block attention module (CBAM) is introduced to the YOLOv5 backbone network to select the information critical to the vehicle detection task and suppress uncritical information, thus improving the detection accuracy of the algorithm. Furthermore, CIoU_Loss is considered the bounding box regression loss function to accelerate the bounding box regression rate and improve the localization accuracy of the algorithm. To verify the performance of the proposed approach, we tested our model via two case studies, i.e., the PASCAL VOC dataset and MS COCO dataset. The results show that the detection precision of the proposed model increased 3.2%, the FLOPs decreased 15.24%, and the number of model parameters decreased 19.37% compared with those of the existing YOLOv5. Through case studies and comparisons, the effectiveness and superiority of the presented approach are demonstrated.

dingedi commented 2 years ago

could you give us more details ? the languages used, the result, etc.

chaodreaming commented 2 years ago

en-zh

PJ-Finlay commented 2 years ago

The Chinese model isn't very good we should train a new one.

goodspeed34 commented 1 year ago

I found this issue too. In zh->en, even the simple word hello can't be translated correctly in Chinese.

So, I tranlated the above sentence to English. And I got this:

I have also found the same problems. In Chinese English translation, even your good translation error

BackMountainDevil commented 1 year ago

It just not translate...

$ cat test.py
import argostranslate.package
import argostranslate.translate

from_code = "en"
to_code = "zh"

# Download and install Argos Translate package
argostranslate.package.update_package_index()
available_packages = argostranslate.package.get_available_packages()
package_to_install = next(
    filter(
        lambda x: x.from_code == from_code and x.to_code == to_code, available_packages
    )
)
argostranslate.package.install_from_path(package_to_install.download())

# Translate
translatedText = argostranslate.translate.translate("Hello World", from_code, to_code)
print(translatedText)
# '你好 世界'

$ python test.py
Hello World
chaodreaming commented 1 year ago

I don't know what you mean.

BackMountainDevil commented 1 year ago

I don't know what you mean.

The python script is used to transelate "Hello World" into Chinese. But when run the script, it does not translate into Chinese. still English.

Maybe it is because argosopentech is good at sentence but bad at word.

mkunz7 commented 1 year ago

https://huggingface.co/Helsinki-NLP/opus-mt-zh-en does a pretty good job, I wonder if we can use that.


# pip install torch
# pip install sentencepiece
# pip install sacremoses

from transformers import MarianMTModel, MarianTokenizer

def chinese_to_english(text):
    model_name = 'Helsinki-NLP/opus-mt-zh-en'
    model = MarianMTModel.from_pretrained(model_name)
    tokenizer = MarianTokenizer.from_pretrained(model_name)

    # Tokenize the text
    tokenized_text = tokenizer.encode(text, return_tensors="pt")

    # Translate the tokenized text
    translated_tokens = model.generate(tokenized_text)

    # Decode the translated tokens to a string
    translated_text = tokenizer.decode(translated_tokens[0], skip_special_tokens=True)
    return translated_text

if __name__ == "__main__":
    chinese_text = input("Enter Chinese text: ")
    translated_text = chinese_to_english(chinese_text)
    print(f"Translated Text: {translated_text}")```
pierotofy commented 1 year ago

We can! https://community.libretranslate.com/t/opus-mt-language-models-port-thread/757/2

https://github.com/LibreTranslate/Locomotive/#convert-helsinki-nlp-opus-mt-models