dominiek / word2vec-explorer

Tool for exploring Word Vector models
MIT License
178 stars 44 forks source link

Word2Vec Explorer

This tool helps you visualize, query and explore Word2Vec models. Word2Vec is a deep learning technique that feeds massive amounts of text into a shallow neural net which can then be used to solve a variety of NLP and ML problems.

Word2Vec Explorer uses Gensim to list and compare vectors and it uses t-SNE to visualize a dimensional reduction of the vector space. Scikit-Learn is used for K-Means clustering.

The UI is built using React, Babel, Browserify, StandardJS, D3 and Three.js.

TSNE 10K

TSNE Labels

Vector Comparisons

Setup

To install all Python depenencies:

pip install -r requirements.txt

Usage

Load the explorer with a Word2Vec model:

./explore GoogleNews-vectors-negative300.bin

Now point your browser at localhost:8080 to load the explorer!

Obtaining Pre-Trained Models

A classic example of Word2Vec is the Google News model trained on 600M sentences: GoogleNews-vectors-negative300.bin.gz

[More pre-trained models]](https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models)

Development

In order to make changes to the user interface you will need some NPM dependencies:

npm install
npm start

The command npm start will automatically transpile and bundle any code changes in the ui/ folder. All backend code can be found in explorer.py and ./explore.

Before submitting code changes make sure all code is compliant with StandardJS as well as Pep8:

standard
pep8 --max-line-length=100 *.py explore

Todo