-
In preparation for v2.0, we will deprecate `[[]]` and `$` operators for tokens
-
Facebook's recent open sourced `fasttext` https://github.com/facebookresearch/fastText improves the `word2vec` SkipGram model. It follows a similar output format for `word` - `vector` key value pairs,…
-
Hi,
I am trying to use Snorkel for my problem settting where I need character-level mappings. What I want is every character being a candidate while keeping Ngrams for its context. For example,
If…
jbkoh updated
5 years ago
-
**Problem:** `FastText` in gensim and official version still produce different output on FB pretrained model (issue with oov word **without ngrams**).
**Prepare data:**
```bash
curl https://dl.…
-
Fasttext uses the hashing trick to map ngrams to a an index in [0, N]. Gensim supports loading models trained with original fasttext implementation from facebook research. It is therefore important th…
leezu updated
5 years ago
-
Hi there,
Lets take a case where we are training a corpus that doesn't contain a given word (say "foo").
If this word shows up in an as yet unknown test statement - you generally see a keyError for …
-
Hi guys, I am just confused about that when I trained my fasttext model on English corpus and then out of curiosity, I exploited it to predict Chinese, and I also got a word embedding, I know we can g…
-
Consider reading just the `bin` file as sugested in https://github.com/RaRe-Technologies/gensim/issues/814#issuecomment-289464725
Compare to C++ code in https://github.com/salestock/fastText.py/blo…
tmylk updated
5 years ago
-
Hey,
I need to use N-Grams with featurizer_count_vectors , but I think it would improve importantly the classification if you could also use it together with word itself. Such that not trained word…
-
Some unicode characters are sneaking into our ngrams. We need to find out why and get rid of them.
When this is done, we will need to regenerate the unit test files.
```python3 testing_framework.…