VinAIResearch / PhoBERT

PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
MIT License
636 stars 92 forks source link

phoBERT model output is not compatible with neural machine translation #31

Closed minhsphuc12 closed 3 years ago

minhsphuc12 commented 3 years ago
from transformers import AutoModel, AutoTokenizer
import os
model_choice = 'vinai/phobert-large'
tokenizer = AutoTokenizer.from_pretrained(model_choice)
model = AutoModel.from_pretrained(model_choice)
text = 'Trái_Đất'
batch = tokenizer.prepare_seq2seq_batch(src_texts = [text], return_tensors = 'pt')
translation = model.generate(**batch) # error happens here
tokenizer.batch_decode(translation, skip_special_tokens=True)

Error message 'BaseModelOutputWithPoolingAndCrossAttentions' object has no attribute 'logits' tells that phoBERT output is BaseModelOutputWithPoolingAndCrossAttentions, while generation utility would only works with CausalLMOutput which has logits attribute.

As read from phoBERT official pages, I do not assume that machine translation is a feature. I just want to ask the author to verify if the real reason for the error message above is phoBERT does not support machine translation yet, or am I missing something here?

Thanks a lot.

datquocnguyen commented 3 years ago

Yes, PhoBERT does not support machine translation.