UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.35k stars 2.48k forks source link

Fast tokenizer for stsb-xlm-r-multilingual model #860

Open Matthieu-Tinycoaching opened 3 years ago

Matthieu-Tinycoaching commented 3 years ago

Hi,

I am blocked with low latency response due to tokenizer computation from stsb-xlm-r-multilingual model.

Could anyone have an idea on how to get a fast tokenizer for stsb-xlm-r-multilingual model ?

Is there any way to run tokenizer on GPU ?

Thanks!

nreimers commented 3 years ago

When you use Huggingface transformers v4, the fast tokenizer should be used by default.

There is a fast tokenizer available for XLM-R: https://huggingface.co/transformers/model_doc/xlmroberta.html#xlmrobertatokenizerfast

You can check the tokenizer like this:

print(type(model.tokenizer))

Running tokenizer on GPU is not possible (and not sensible).

Matthieu-Tinycoaching commented 3 years ago

Hi @nreimers

You mean this code instead?:

from transformers import AutoTokenizer

model_name = "sentence-transformers/stsb-xlm-r-multilingual"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
print(type(tokenizer))

And I got the following answer: <class 'transformers.models.xlm_roberta.tokenization_xlm_roberta_fast.XLMRobertaTokenizerFast'>

So this means I already use the xlmrobertatokenizerfast?

Is there an increase of speed if I use Sentence-Transformers instead of HuggingFace Models Repository for the stsb-xlm-r-multilingual model (https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual)?

nreimers commented 3 years ago

Yes, you use the TokenizerFast.

Speedup: Not necessarily. Sentence Transformers relies on the Tokenizer & Model from HF Transformers. It does some optimizations to reduce the padding and compute overhead when you encode a larger batch of sentences. But speed will be about the same.