Closed deepiksdev closed 7 years ago
@deepiksdev
I installed successfully rasa_nlu in my macbook, and when I ran the server according to tutorial, I got a error:
ERROR:root:Failed to load model 'model_20170613-113720'. Error: 1
Which is the same as https://github.com/RasaHQ/rasa_nlu/issues/284
The I updated the repo of rasa_nlu and reinstall rasa_nlu and all his dependencies in requirement.txt
and dev-requirement.txt
, then it works.
So I suggest that we should update the repo.
@deepiksdev
Successfully tested the English version.
And I sent a request to the sever
curl -XPOST localhost:5000/parse -d '{"q":"I am looking for Chinese food"}' | python -mjson.tool
and I got
{
"entities": [
{
"end": 24,
"entity": "NORP",
"extractor": "ner_spacy",
"start": 17,
"value": "Chinese"
}
],
"intent": {
"confidence": 0.7805998076148585,
"name": "restaurant_search"
},
"intent_ranking": [
{
"confidence": 0.7805998076148585,
"name": "restaurant_search"
},
{
"confidence": 0.0941985709844514,
"name": "greet"
},
{
"confidence": 0.08374048180790579,
"name": "goodbye"
},
{
"confidence": 0.04146113959278425,
"name": "affirm"
}
],
"text": "I am looking for Chinese food"
}
@deepiksdev I finished the segmentation of Chinese datasets of wikipedia.
@deepiksdev I created a repo in s3, and upload 4 files to s3.
You can find them inhttps://s3.amazonaws.com/deepiks-training/datasets/rasa_chinese
@ALLENWAN1 Great, thanks. Please commit the python file.
@deepiksdev I just finished training the Chinese wikipedia data from words to vector. And I make a uni-test. The result is very good and interesting. when I search the most similar words to "吃饭" I got
when I search the most similar word to "中文" I got
According to my Chinese background, I think the result is very good.
@ALLENWAN1 Congrats, this sounds great ! Can you please upload the model to the S3 ?
I uploaded the model to s3. You can find it in
https://s3.amazonaws.com/deepiks-training/models/rasa_chinese
Brilliant, many thanks.
@deepiksdev I am going to work on defining the rules in Chinese for generating rasa model.
@ALLENWAN1
I am going to work on defining the rules in Chinese for generating rasa model.
I am not sure I understand what you mean. Can you point me to some documentation ?
@deepiksdev https://spacy.io/docs/usage/adding-languages Maybe I didn't explain well.
OK, thanks. I was not aware Rasa was using Spacy.
@deepiksdev rasa provide two options as backend. see http://rasa-nlu.readthedocs.io/en/stable/tutorial.html training part.
@deepiksdev When I try to study the rules for Chinese, I find there are some differences between Chinese and English. For example, If I say : "I want to swim" and "swimming is interesting". And we know "swim" is a verb and "swimming" is a nom.
But In Chinese we say: "我要去(I want to )游泳(swim)" and "游泳(swimming)很有意思(is interesting)". We found that "游泳" can be a verb and also be a nom. Maybe it will make a conflict.
@ALLENWAN1 Thanks for this explanation. Can you explain why this is a problem for the current issue ?
@deepiksdev Because I am afraid of that will make a mistake in NER. I am not sure. We will see when we finished the language model.
@ALLENWAN1 OK, we will see.
@ALLENWAN1 Have you looked at how Spacy implements Chinese: https://github.com/explosion/spaCy/tree/master/spacy/zh ?
@deepiksdev I am working on adding-languages
The first step : language-subclass I have finished it.
The second step: stop words,tag map and tokenizer exceptions. I am working on it. It is a little complicated because it is different for different languages. And Chinese is quitte different to the others.
The third part data I almost finished.
The last need the results of first 3 parts.
@deepiksdev
@ALLENWAN1 Have you looked at how Spacy implements Chinese: https://github.com/explosion/spaCy/tree/master/spacy/zh ?
I have found it before. And I do the same thing as they did. But no body finished it, so I need to continue it.
@ALLENWAN1
I believe the Spacy tokenizer will probably not work, because it is based on spaces: see https://spacy.io/docs/usage/customizing-tokenizer . This may prevent us from using Spacy.
@ALLENWAN1
I believe the Spacy tokenizer will probably not work, because it is based on spaces: see https://spacy.io/docs/usage/customizing-tokenizer . This may prevent us from using Spacy.
This is wrong. You need to install Jieba: https://pypi.python.org/pypi/jieba/ . Have you done so ?
@deepiksdev This problem have been solved. By using jieba, I got the result.
@ALLENWAN1
This problem have been solved. By using jieba, I got the result.
Sorry, I don't understand. What problem ? What result ?
@deepiksdev sorry for forgetting copy the link.
I believe the Spacy tokenizer will probably not work, because it is based on spaces: see https://spacy.io/docs/usage/customizing-tokenizer . This may prevent us from using Spacy.
I means that I can do the tokenizer now. This will not be a problem.
The result is that I used jieba
to cut the wikipedia data, it shows that the result is good.
@ALLENWAN1 Are you finally still working on this, or is it on hold ?
@deepiksdev it's on hold.
OK, thanks. Closing it for the time being.
See https://rasa-nlu.readthedocs.io/en/stable/languages.html