dugongdingo / WEEL

WEEL - Word Embeddings Experiments with Linguality
0 stars 0 forks source link

Add embeddings to NLG Pipeline #3

Closed dugongdingo closed 6 years ago

dugongdingo commented 6 years ago

Retrieve word embeddings for each input words

dugongdingo commented 6 years ago

requirements: e974c39d629360e9cfb13773b12eb63eaff05aec

dugongdingo commented 6 years ago

The FastText algorithm functions by first computing all subwords for an input word, and then computing the hashes for each subsequence. These hashes are used as indices to retrieve vectors from the input weight matrix.

Some of the hashes point to indices over the actual range of the input matrix, which leads to a sad segfault.

The problem might lie with invalid encoding or corrupted data.