UBC-NLP / araT5

AraT5: Text-to-Text Transformers for Arabic Language Understanding
84 stars 18 forks source link

What is the mask token in AraT5-base? #10

Open HMJW opened 1 year ago

HMJW commented 1 year ago

I can't find any token like or < mask > in the vocab. What is the mask token in AraT5-base or how do I get the mask id with huggingface codes?

NoraAlt commented 1 year ago

Same question.. please

AMR-KELEG commented 1 year ago

@Nagoudi @elmadany Could you please advise in this regard? I need to use the araT5 model in the same way as the below code snippet, but the model is not operating as expected.

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

input_ids = tokenizer("The <extra_id_0> walks in <extra_id_1> park", return_tensors="pt").input_ids
labels = tokenizer("<extra_id_0> cute dog <extra_id_1> the <extra_id_2>", return_tensors="pt").input_ids

# the forward function automatically creates the correct decoder_input_ids
loss = model(input_ids=input_ids, labels=labels).loss
loss.item()

Am I missing anything?

Thanks 🙏🏽