RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.95k stars 4.64k forks source link

Spacy NER bad results #168

Closed jithurjacob closed 7 years ago

jithurjacob commented 7 years ago

Hi guys,

I tried RASA on Python 3 by using files from src and doing a lot of reordering. But when I test it with the toy data set this is the result. How can the output from spacy be this bad ( {"entity": "location", "start": 13, "end": 16, "value": "for"} ?

query: I am looking for Chinese food

{"confidence": 0.6662957848930847, "intent": "restaurant_search", "text": "I am looking for Chinese food", "entities": [{"entity": "location", "start": 5, "end": 12, "value": "looking"}, {"entity": "location", "start": 13, "end": 16, "value": "for"}, {"entity": "cuisine", "start": 17, "end": 24, "value": "Chinese"}, {"entity": "cuisine", "start": 25, "end": 29, "value": "food"}]}

jithurjacob commented 7 years ago

wow spacy is using averaged perceptron so is expecting like 5000 samples for training.... This would be a good point that can be added to Rasa documentation as warning for using spacy+scikit backend

source: https://github.com/explosion/spaCy/issues/773

jithurjacob commented 7 years ago

any used mitie on Python 3?

tmbo commented 7 years ago

You are completely right there, spacy needs a lot of training data to perform well when annotating entities. I just added an option that allows to reuse pretrained spacy NER models (e.g. for locations or dates).

There is a separate issue for python 3 #68. Status: new code we write is compatible with both (2.7 and 3.6), but we have not ported all parts of the existing code base yet.

jithurjacob commented 7 years ago

@tmbo can you share the code to reuse NER

Also can you please tell me if you are getting good results with MITIE ?

I'll try to contribute on weekend towards making it compatible with Py2/3

alfredfrancis commented 7 years ago

@jithurjacob How about this opensource project on github. It has all functionalities of rasa and it only need few training examples. It uses pycrf suite insted of Spacy NER

amn41 commented 7 years ago

Yes using a conditional random field as an alternative for parsing entities is on our roadmap. It will make more or less sense than the MITIE/spaCy approaches depending on people's use cases, so I would also want to provide good docs & guidelines on when to use which

jithurjacob commented 7 years ago

@alfredfrancis Thank you for bringing it to my notice I'll definitely try it out.

I'm doing a comparison of various open source bot developer frameworks available on Python, could you please provide the links of other frameworks that you are aware of?

jithurjacob commented 7 years ago

@amn41 absolutely, it makes sense it would be great if you could list the possible alternatives for sklearn_spacy or MITIE so that others could build the backend and contribute to RASA.

Could you please provide your wish lists for backend of RASA, as depending upon usecase people can select it.

For me the training should be very minimal and I'm happy with average performance.

alfredfrancis commented 7 years ago

@jithurjacob CRF can give pretty good results on minimal training. I'm talking about more than 80% accuracy with just 4-5 examples. check Chatterbot

jithurjacob commented 7 years ago

@alfredfrancis I couldnt find any mentioned of CRF being used in chatterbot can you please point me towards the correct source?

alfredfrancis commented 7 years ago

@jithurjacob how about this one

jithurjacob commented 7 years ago

@alfredfrancis sorry I got confuesd with ChatterBot... I tried your project with the book cab example.. The results are good for the small POC and I will test it further. One issue I'm facing is that I'm not able to compile the library for Windows 64/Python 3.5... I tested it on Py 2.7 and is working fine.

Could you please share the wheel file if you are having for Py3.5/Win64

alfredfrancis commented 7 years ago

@jithurjacob I haven't tested for Py3/Win

jithurjacob commented 7 years ago

@tmbo Thank you for adding CRF support in Rasa