facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.89k stars 4.71k forks source link

Get word for vector #304

Open genixpro opened 7 years ago

genixpro commented 7 years ago

Hello,

I have been using fasttext to get word vectors for my text. Works great, love it.

However, I am training my neural networks to predict a word-vector as output as well as taking them as inputs. Given a word-vector, is it possible to use fasttext to get back to the original word?

Brad

pommedeterresautee commented 7 years ago

You can search for the most similar word to your vector which should be the correct word.

genixpro commented 7 years ago

Thank you @pommedeterresautee for your prompt response. I have tried to do this, however I don't think the design of the command works for this. Am I doing something wrong?

bradley@training1:~ fastText/fasttext print-word-vectors wiki.en.bin test test -0.22845 0.37265 -0.22408 0.3532 -0.14209 -0.13462 -0.1012 -0.14947 -0.2009 0.162 -0.11937 0.019124 -0.062469 -0.23832 -0.021025 0.17223 -0.10575 -0.18938 0.13347 0.23241 0.12316 -0.28722 -0.0011137 -0.30305 -0.40454 0.30922 -0.10981 -0.08277 -0.28274 0.59532 0.17844 0.24817 -0.3324 0.39598 0.32062 0.15453 0.077169 -0.021413 0.2365 0.12956 -0.068721 0.29645 0.27834 -0.12812 -0.073737 0.29536 0.049559 -0.17733 0.22481 -0.1538 -0.079588 -0.36874 0.047667 -0.28428 0.088077 0.22102 0.21649 -0.063708 0.034327 -0.048696 -0.053667 -0.17283 0.38246 -0.16694 -0.010466 -0.30402 -0.21782 0.34955 -0.26213 -0.072722 0.31054 -0.072455 0.343 -0.19792 0.062006 0.00051846 0.56766 -0.13853 0.51845 -0.2174 0.45157 0.5066 -0.093695 -0.22684 -0.20934 -0.20152 -0.063234 0.22183 0.14046 -0.30751 -0.11322 -0.10386 0.47992 -0.40913 0.15919 0.018272 -0.13569 0.22536 -0.086809 -0.13842 0.38312 0.075457 0.18258 0.13776 0.12134 -0.096159 -0.13132 -0.24718 0.16702 0.20203 0.19815 -0.11735 -0.37614 -0.24489 -0.090014 -0.69299 0.015054 -0.096322 -0.26107 0.14572 0.19957 0.070053 0.19072 0.66283 0.28461 -0.33706 0.35354 -0.14954 0.06476 0.10766 0.20498 0.14909 -0.31268 -0.11155 -0.085746 -0.108 -0.075783 0.15087 0.014965 0.18655 0.16094 0.30434 0.15455 -0.2288 -0.026111 -0.089849 0.20008 0.068441 -0.52238 0.27641 0.36642 -0.037259 -0.23176 -0.41184 0.044311 0.19295 0.098848 0.25589 0.41192 0.35431 -0.066927 -0.45283 -0.38165 -0.19036 0.40389 0.71034 -0.00042726 0.048764 0.12588 0.32249 0.026886 -0.041279 -0.056401 -0.15429 -0.14598 0.087302 -0.15133 -0.011423 -0.33601 0.21756 -0.66666 -0.18553 -0.048151 -0.24353 -0.45588 0.26067 -0.33251 0.48818 0.27706 -0.05943 0.011523 0.13029 -0.18775 -0.10091 -0.16457 -0.30895 0.26837 0.1226 -0.11131 -0.21162 0.15022 0.44582 -0.47013 0.00071141 0.33787 -0.058897 -0.17088 0.0068217 0.2027 0.0092107 0.65304 -0.063435 -0.16313 0.081751 0.37752 0.17213 0.058747 -0.35895 -0.20286 0.32908 -0.073557 0.28038 -0.12386 0.46821 0.29884 0.33992 -0.74011 -0.20773 -0.21865 0.18719 -0.25815 0.023019 0.0298 -0.3043 -0.14172 0.27517 -0.18493 -0.18234 -0.2108 0.51393 0.28656 0.11507 0.088767 0.26362 0.16405 0.15349 -0.045914 0.066187 0.012116 0.44456 -0.15253 0.078308 -0.032889 0.40659 -0.093495 0.09221 0.083401 -0.16922 -0.018506 0.014491 -0.59459 -0.14919 -0.41384 0.32292 -0.15095 0.11613 -0.19485 -0.049387 0.21255 -0.38342 -0.25966 0.079703 0.54103 0.003341 -0.38825 -0.30735 0.17938 0.1806 0.039998 -0.15936 -0.40701 -0.18121 -0.031116 -0.33815 -0.17768 -0.19247 0.22742 -0.050092 -0.18437 -0.04951 -0.10754 -0.24612 0.047924 -0.2327 -0.095801 -0.26536 0.25838 0.048852 -0.41783 -0.016608 ^C bradley@training1:~ fastText/fasttext nn wiki.en.bin Pre-computing word vectors... done. Query word? -0.22845 0.37265 -0.22408 0.3532 -0.14209 -0.13462 -0.1012 -0.14947 -0.2009 0.162 -0.11937 0.019124 -0.062469 -0.23832 -0.021025 0.17223 -0.10575 -0.18938 0.13347 0.23241 0.12316 -0.28722 -0.0011137 -0.30305 -0.40454 0.30922 -0.10981 -0.08277 -0.28274 0.59532 0.17844 0.24817 -0.3324 0.39598 0.32062 0.15453 0.077169 -0.021413 0.2365 0.12956 -0.068721 0.29645 0.27834 -0.12812 -0.073737 0.29536 0.049559 -0.17733 0.22481 -0.1538 -0.079588 -0.36874 0.047667 -0.28428 0.088077 0.22102 0.21649 -0.063708 0.034327 -0.048696 -0.053667 -0.17283 0.38246 -0.16694 -0.010466 -0.30402 -0.21782 0.34955 -0.26213 -0.072722 0.31054 -0.072455 0.343 -0.19792 0.062006 0.00051846 0.56766 -0.13853 0.51845 -0.2174 0.45157 0.5066 -0.093695 -0.22684 -0.20934 -0.20152 -0.063234 0.22183 0.14046 -0.30751 -0.11322 -0.10386 0.47992 -0.40913 0.15919 0.018272 -0.13569 0.22536 -0.086809 -0.13842 0.38312 0.075457 0.18258 0.13776 0.12134 -0.096159 -0.13132 -0.24718 0.16702 0.20203 0.19815 -0.11735 -0.37614 -0.24489 -0.090014 -0.69299 0.015054 -0.096322 -0.26107 0.14572 0.19957 0.070053 0.19072 0.66283 0.28461 -0.33706 0.35354 -0.14954 0.06476 0.10766 0.20498 0.14909 -0.31268 -0.11155 -0.085746 -0.108 -0.075783 0.15087 0.014965 0.18655 0.16094 0.30434 0.15455 -0.2288 -0.026111 -0.089849 0.20008 0.068441 -0.52238 0.27641 0.36642 -0.037259 -0.23176 -0.41184 0.044311 0.19295 0.098848 0.25589 0.41192 0.35431 -0.066927 -0.45283 -0.38165 -0.19036 0.40389 0.71034 -0.00042726 0.048764 0.12588 0.32249 0.026886 -0.041279 -0.056401 -0.15429 -0.14598 0.087302 -0.15133 -0.011423 -0.33601 0.21756 -0.66666 -0.18553 -0.048151 -0.24353 -0.45588 0.26067 -0.33251 0.48818 0.27706 -0.05943 0.011523 0.13029 -0.18775 -0.10091 -0.16457 -0.30895 0.26837 0.1226 -0.11131 -0.21162 0.15022 0.44582 -0.47013 0.00071141 0.33787 -0.058897 -0.17088 0.0068217 0.2027 0.0092107 0.65304 -0.063435 -0.16313 0.081751 0.37752 0.17213 0.058747 -0.35895 -0.20286 0.32908 -0.073557 0.28038 -0.12386 0.46821 0.29884 0.33992 -0.74011 -0.20773 -0.21865 0.18719 -0.25815 0.023019 0.0298 -0.3043 -0.14172 0.27517 -0.18493 -0.18234 -0.2108 0.51393 0.28656 0.11507 0.088767 0.26362 0.16405 0.15349 -0.045914 0.066187 0.012116 0.44456 -0.15253 0.078308 -0.032889 0.40659 -0.093495 0.09221 0.083401 -0.16922 -0.018506 0.014491 -0.59459 -0.14919 -0.41384 0.32292 -0.15095 0.11613 -0.19485 -0.049387 0.21255 -0.38342 -0.25966 0.079703 0.54103 0.003341 -0.38825 -0.30735 0.17938 0.1806 0.039998 -0.15936 -0.40701 -0.18121 -0.031116 -0.33815 -0.17768 -0.19247 0.22742 -0.050092 -0.18437 -0.04951 -0.10754 -0.24612 0.047924 -0.2327 -0.095801 -0.26536 0.25838 0.048852 -0.41783 -0.016608 taluqan 0.507104 tulkara 0.491032 y&nr 0.486769 lenud 0.486037 shakarganj 0.471436 jamkandorna 0.468951 horowpathana 0.467936 dudhala 0.467701 sanawbari 0.466951 عصمت 0.465585 Query word? francoist 0.376442 cobelligerent 0.371926 nonradical 0.364669 preconquest 0.36421 stalemated 0.363388 ismb/eccb 0.3597 uruguay#topography 0.359458 complutenses 0.357127 reimposition 0.356362 complutense 0.355167 Query word? ^C bradley@training1:~

pommedeterresautee commented 7 years ago

nn expects a word. You will need to perform this operation through a wrapper.

genixpro commented 7 years ago

@pommedeterresautee I have created a pull-request for the library, implementing this functionality on the command line tool: https://github.com/facebookresearch/fastText/pull/305

cpuhrsch commented 6 years ago

Hello @genixpro,

Thank you for your post and pull request. I'd also like to point that we now have Python bindings that should make it easier for your to implement and use this feature. In particular you can load a model and retrieve word embeddings for any word. See our README on python for examples of this.

Thanks, Christian

masud-technope commented 6 years ago

Right now the word vector print some numbers. Can I get the contextual words along with the numbers as well? I would like to determine the proximity of two words A and B in the semantic space.