Batch-mode prediction - Githubissues

antoine-isnardy-danone commented 3 years ago

Hi,

Thank you for providing these tremendous resources. I'm currently trying to leverage the models that were uploaded to Hugginface (this one e.g.)

Is it expected not to be able to tokenize/generate in a batch-mode fashion? See below an example:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en")
inputs = tokenizer.encode("mango manzana y pera", return_tensors="pt")
inputs

tensor([[34090, 29312, 11, 306, 75, 0]])

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en")
inputs = tokenizer.encode(["mango manzana y pera"], return_tensors="pt")
inputs

tensor([[1, 0]])

Qwert567777 commented 3 years ago

Fucjivu

jorgtied commented 3 years ago

I am not sure how compatible the tokenizers from huggingface are with the SentencePiece unigram models that we provide for the models here that have been converted to their interfaces. This would be a question to ask at huggingface. Good luck!

Helsinki-NLP / Tatoeba-Challenge

Batch-mode prediction #8