deeppavlov / intent_classifier

Apache License 2.0
84 stars 31 forks source link

About non-english language #4

Closed uugan closed 6 years ago

uugan commented 6 years ago

Is that possible to train other languages (for using fastext model in this project)?

Parameter fasttext_model contains path to pre-trained binary skipgram fastText [2] model for English language. If one prefers to use default model, it will be downloaded when one will train model.

But if I understood well fasttext supports UTF-8 (cyrillic, latin, chinese etc). Also how did you generated http://lnsigo.mipt.ru/export/embeddings/reddit_fasttext_model.bin file?

After download all those things it shows only 'SearchCreativeWork':

>python intent_classifier.py snips_pretrained\snips_config.json
F:\Projects\python\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
2018-03-23 07:06:39.529491: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
>weather in moscow?
(0.378094, 'SearchCreativeWork')
dilyararimovna commented 6 years ago

Hello!

  1. Yes, it is possible to train your own model on any language using https://github.com/facebookresearch/fastText

You just should provide dataset in the considered language.

It is important that fastText also updated to new version. Therefore, currently considered for this repo version fasttext==0.8.3 (available here https://pypi.python.org/pypi/fasttext) is not compatible with one provided on GitHub (link is above).

If your prefer new version fastText you can use DeepPavlov repository for intent classification (https://github.com/deepmipt/DeepPavlov/tree/master/deeppavlov/models/classifiers/intents).

  1. For the second question, I could bot reproduce your result. I got:
    weather in moscow?
    (0.9999671, 'GetWeather')
    weather in london?
    (0.99995196, 'GetWeather')
    play me rihanna
    (0.99532384, 'PlayMusic')

    I realized that there is no requirements on fasttext version in requirements.txt I fixed that. Please, install requirements again and check that version of fasttext==0.8.3.

uugan commented 6 years ago

requirements.txt file contains now:

numpy
nltk
keras
tensorflow
gensim
pandas
sklearn
h5py
tqdm
fasttext==0.8.3

pip install -r requirements.txt ... Installing collected packages: fasttext Successfully installed fasttext-0.8.3

After that I run:

python intent_classifier.py snips_pretrained/snips_config.json

Result:

F:\Projects\python\lib\site-packages\h5py__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Using TensorFlow backend. 2018-03-25 11:56:04.398535: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 weather in moscow? (0.378094, 'SearchCreativeWork')

something wrong here. I use windows 10, python 3.6.3 :: Anaconda Custom (64-bit),tensorflow 1.5.0, keras 2.1.4 also downloaded model file from your site.

dilyararimovna commented 6 years ago

Thank you for reply. Unfortunatelly, the main issue here is that fasttext does not work with Windows correctly even fasttext affirms to be installed correctly. Try this on linux or mac systems.