Lotus-King-Research / Requests

Common repository for RFCs
0 stars 0 forks source link

[DRAFT] RFC0005: "canonical" Tibetan vectors #5

Closed mikkokotila closed 1 year ago

mikkokotila commented 4 years ago

OBJECTIVE

To train several open-source "canonical" vectors, starting from word2vec. The vectors will be made available publicly, to unlock vector/embedding based modelling with common NLP libraries such as SpaCy, as well as building of completely custom workflows with libraries such as TensorFlow.

There are several important distinctions here:

MATERIALS

TOOLS

The fastest and the most robust way to get started is with Gensim, as it already has word2vec training built in. SpaCy recommends using Gensim for training word2vec vectors.

Core vectiorization workflow codes should be implementable to extend Signs.