UBC-NLP / araT5

AraT5: Text-to-Text Transformers for Arabic Language Understanding
84 stars 18 forks source link

English-to-Arabic Translation Issue #16

Open Amshaker opened 9 months ago

Amshaker commented 9 months ago

Hi @Nagoudi @elmadany ,

Thank you so much for open-sourcing your awesome models. I have a question please, I want to use AraT5 or AraT5v2 for machine translation from English to Arabic. Could you please share an example to do that? I tried to use your models with the following code but the output does not make sense. Here is the code:

from transformers import T5Tokenizer, AutoModelForSeq2SeqLM, AutoTokenizer, pipeline

model = AutoModelForSeq2SeqLM.from_pretrained("UBC-NLP/AraT5-msa-base")
tokenizer = AutoTokenizer.from_pretrained("UBC-NLP/AraT5-msa-base")
tokenizer.src_lang="English"
tokenizer.tgt_lang="Arabic"

ar_prompt="The scene displays a group of people gathered around a wooden dining table in an indoor setting."
input_ids = tokenizer(ar_prompt, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print("Tokenized input:", tokenizer.tokenize(ar_prompt))
print("Decoded output:", tokenizer.decode(outputs[0], skip_special_tokens=True))

This is the current output:

Tokenized input: ['▁The', '▁scene', '▁display', 's', '▁a', '▁group', '▁of', '▁people', '▁gathered', '▁around', '▁a', '▁wooden', ```
▁di', 'ning', '▁table', '▁in', '▁an', '▁indoor', '▁setting', '.']

Decoded output: هحب هحب هحب هحب هحب هحب هحب هحب هحب هحب هحب هحب هحب هحب هحب هحب هحب هحب هحب

Please let me know what the issue is in the above code or share an example of how to use your model for translation from English to Arabic.