Can fastText train doc/sentence embedding?

facebookresearch / fastText

Library for fast text representation and classification.

https://fasttext.cc/

MIT License

25.8k stars 4.71k forks source link

Can fastText train doc/sentence embedding? #623

Open letotefrank opened 6 years ago

letotefrank commented 6 years ago

hello! Question in title, no label data，not classification task thank you for your reply！

yunhenk commented 6 years ago

You may refer to this work: https://github.com/epfml/sent2vec

letotefrank commented 6 years ago

@yunhenk How to repeat training, secondary training based on a model in Python？Thanks!

yunhenk commented 6 years ago

Fasttext seems can't recover from a trained model, but you can try hacking the code. On the other hand, you can pretrain by setting the 'pretrainedVectors' parameter.

letotefrank commented 6 years ago

@yunhenk Can you take a example? (setting the 'pretrainedVectors' parameter),thank you

yunhenk commented 6 years ago

see https://github.com/facebookresearch/fastText#full-documentation

letotefrank commented 5 years ago

@yunhenk I'm trying to fine tune on this basis(wiki.zh.vec) in Python, but the results is bad.The original word vector(wiki.zh.ve) became zero.

Note: 'my_data.txt' is my corpus.It is data sentence by sentence. # fit model = train_unsupervised( input=os.path.join('./data', 'my_data.txt'), model='skipgram', minCount=1, dim=300, pretrainedVectors="./vec/wiki.zh.vec" )

Thank you for you reply!

yunhenk commented 5 years ago

@letotefrank You can try debugging the code to find the problem.

letotefrank commented 5 years ago

@yunhenk Is it convenient to contact you? my WeChat: lcy20101129

jaytimbadia commented 3 years ago

You may refer to this work: https://github.com/epfml/sent2vec

Hi, Really great source!!

I just have one question.

Which one (FastText, word2vec, glove) is good in getting better sentence embedding from respective word vectors by averaging them?

Which according to you will give better results on search results for sentences if I embed them?

yunhenk commented 3 years ago

You may refer to this work: https://github.com/epfml/sent2vec

Hi, Really great source!!

I just have one question.

Which one (FastText, word2vec, glove) is good in getting better sentence embedding from respective word vectors by averaging them?

Which according to you will give better results on search results for sentences if I embed them?

Bert would be better for sentence searching

jaytimbadia commented 3 years ago

My hardware restricts me from using Bert. Can you arrange above three in order of better sentence embedding on averaging?

On Sat, 19 Jun 2021, 08:47 yunhenk, @.***> wrote:

You may refer to this work: https://github.com/epfml/sent2vec

Hi, Really great source!!

I just have one question.

Which one (FastText, word2vec, glove) is good in getting better sentence embedding from respective word vectors by averaging them?

Which according to you will give better results on search results for sentences if I embed them?

Bert would be better for sentence searching

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/fastText/issues/623#issuecomment-864347546, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJOJERT3RAG4SCSIJUJKD2DTTQD4BANCNFSM4FRD36ZQ .

yunhenk commented 3 years ago

My hardware restricts me from using Bert. Can you arrange above three in order of better sentence embedding on averaging? … On Sat, 19 Jun 2021, 08:47 yunhenk, @.***> wrote: You may refer to this work: https://github.com/epfml/sent2vec Hi, Really great source!! I just have one question. Which one (FastText, word2vec, glove) is good in getting better sentence embedding from respective word vectors by averaging them? Which according to you will give better results on search results for sentences if I embed them? Bert would be better for sentence searching — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#623 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJOJERT3RAG4SCSIJUJKD2DTTQD4BANCNFSM4FRD36ZQ .

I think fasttext 's word embedding is almost the same as word2vector ,glove might be better according to glove's authors. But I recommend you'd better do some evaluation on your dataset.