MartinoMensio / spacy-universal-sentence-encoder

Google USE (Universal Sentence Encoder) for spaCy
MIT License
176 stars 12 forks source link

Is this only for English? #2

Closed adamjleonard closed 4 years ago

adamjleonard commented 4 years ago

I am curious if this can be used with another language as long as the model corresponds to that language? Currently I am doing work in Spanish and have been using the news es model.

Thanks!

MartinoMensio commented 4 years ago

Hi Adam! On TF Hub there are are also multilingual models. You can see the full set of Universal Sentence Encoder models here https://tfhub.dev/google/collections/universal-sentence-encoder/1

So you would be interested in one of the following:

I think I can embed these models in SpaCy wrappers with just a few lines. Which model would you prefer having first?

I didn't find a spanish-only model for the Universal Sentence Encoder

adamjleonard commented 4 years ago

I believe the multilingual is fine, as a lot of the text I am working with actually contains a few things in English.

I'll probably implement, the multilingual-large. I think I can accomplish it, I just wanted to make sure I wasn't missing something other than just loading up one of these models!

MartinoMensio commented 4 years ago

Great! This repository is just wrapping these models, so you can use them directly from TF Hub. I am wrapping them together because I would like to have a smooth integration with SpaCy, as it was done for example with spacy-transformers

MartinoMensio commented 4 years ago

I managed to wrap the first two multilingual models here

But with the other models I have some issues with a not registered SentencepieceEncodeSparse that I cannot make work on my operating system now. I hope that helps.

MartinoMensio commented 4 years ago

Can I consider the issue as solved?