NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.83k stars 899 forks source link

How to use fasttext (subwords) vectors in tensorflow embedding layer? #746

Closed datistiquo closed 5 years ago

datistiquo commented 5 years ago

Hey,

I just struggle to find out how can I use Fasttext wordvectors for OOV word (in general) in a keras/tensorflow embedding layer. There is nothing out there. Maybe someone has thought of that too and has some hints for me?

The way via word embedding look up works via indices like tf.nn.embedding_lookup(word_embeddings, x1)

And you could have an index for one OOV. But how can I assign a specific vector (from a different and custom source like fasttext) at runtime?

I am not so experience with the inner working of tensorflow embedding_lookup.

Up to know I handle OOV words by splitting the OOV in known subwords! So I make more words out of a OOV.

bwanglzu commented 5 years ago

I have 0 experience with Fasttext, maybe others can help.

datistiquo commented 5 years ago

@bwanglzu It has nothing to do with Fasttext! It is about manipulating/customisising word vectors by conditions! If you encounter an OOV word during prediction, how can you assign a specific vector to this this specific word (ie. you check the word for knwon subwords by your own method or using just fasttext...)

bwanglzu commented 5 years ago

@datistiquo take a look at Vocabulary class here.

btw, we're volunteers to maintain the open source project, you're not my customer. Please ask your questioon in a modest way.

uduse commented 5 years ago

I hope things are working well for you now. I’ll go ahead and close this issue, but I’m happy to continue further discussion whenever needed.