facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.85k stars 4.71k forks source link

Learn words representation with CBOW plus position-weights #445

Open loretoparisi opened 6 years ago

loretoparisi commented 6 years ago

As for the recent paper "Learning Word Vectors for 157 Languages", the model CBOW is used with position dependent weights. Using that the new pre-trained model were produced. Is it possible to train a unsupervised model with CBOW in this version of fastText using the same approach with position weights?

luthfianto commented 6 years ago

Also, which command in fasttext corresponds to the positional-weighting?

matanox commented 6 years ago

I don't think this is implemented in the publicly available codebase.

loretoparisi commented 6 years ago

@matanster you are right, it is not, at least not yet, while the most two recent papers show models that has been trained FastText with position dependent weights.

ghost commented 6 years ago

Do you know if there are any updates of positional-weighting in fasttext base code?

loretoparisi commented 6 years ago

@omriFdna so far I'm not aware of a version that implements the positional weights bow, but I will have a look maybe someone did...

ghost commented 6 years ago

@loretoparisi any luck?

loretoparisi commented 6 years ago

@omriFdna not yet so far :

loretoparisi commented 5 years ago

A dataset with position weights trained with CBOW has been recently released for Wikipedia and Common Crawl - https://fasttext.cc/docs/en/crawl-vectors.html

The model parameters were:

DIM NGRAM WS NEG
300 5 5 10

It would be great to have this as training option as well.

ghost commented 5 years ago

I agree, they also report in their papers that position weights improves the performance, I wish it was part of the training options.

adam2326 commented 5 years ago

+1

On Thu, Oct 11, 2018 at 2:58 AM Omri notifications@github.com wrote:

I agree, they also report in their papers that position weights improves the performance, I wish it was part of the training options.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/fastText/issues/445#issuecomment-428842667, or mute the thread https://github.com/notifications/unsubscribe-auth/AHUWqNswFJkmbxOlBP3L8vJjmMogNTYlks5ujuv9gaJpZM4STDKV .

Witiko commented 4 years ago

I am attempting to add this feature to Gensim, see https://github.com/RaRe-Technologies/gensim/pull/2905.

Witiko commented 2 years ago

The work from https://github.com/RaRe-Technologies/gensim/pull/2905 has been accepted as a journal paper that is to appear in J.UCS 28:2. To conduct experiments for the paper, I have produced the high-level PInE library that uses a fork of Gensim 3.8.3 for model training. Perhaps the PInE library can serve as an inspiration for an implementation to facebook's fastText and to current Gensim. 💡