This is a simple Chatbot written in Python 3.5 with a MySQL database backend. The code builds on my other SimpleBot demo (https://github.com/edbullen/SimpleBot) and introduces some NLP with Python NLTK and basic Machine Learning capabilities to demonstrate Sentence Classification using the NLTK and scikit-learn. The Stanford CoreNLP package, written in Java, is also used to parse grammar and extract sentence topics, subject, object etc.
The ChatBot conversation operates in 3 modes
As it currently stands (May 2017), the functionality is far from perfect, but it demonstrates the concepts of Natural Language Processing, Sentence Classification and a very basic level of Natural Language Grammar processing.
This version is still at a basic experimentation level - there is no concept of authentication, security etc.
Stanford CoreNLP Parser This is a Java package that needs to be download and located in suitable dir for future ref
Java - tested with Java 8, java version "1.8.0_131"
Core Functionality
chatbot.py
- main ChatBot library utils.py
- generic function utilities used by ChatBot (config, DB conn etc) features.py
- library for extracting features from sentences using NLTKbotserver.py
- Multi-Threaded server to allow multiple clients to connect to the ChatBot via network socketssimpleclient.py
- Simple network sockets client to connect to botserver
Default botserver.py
logging location is
./log/bostserver.log
Setup and Test
./config/config.ini
- template config file, requires editing before starting the chatbot for the first time. pwdutil.py
- store an encoded password for connecting to the database schema.setupDatabase.py
- drop and recreate the database tables (existing data gets lost).pingDB.py
- test the database configuration: create test table, insert data, query it, drop the test table.mlClassGenerateRfModel.py
- generate a scikit-learn Random Forest Model for sentence classification based on input CSV. Default output file-name is "RFmodel.ml" Tools and Utilities
All Dump and Load utilities use the
./dump
subdirectory by default.
dataDump.py
- Dump out a database table in CSV formatdataLoad.py
- Load a database table in CSV formatfeaturesDump.py
- read in a CSV of sentences, dump out features using features.py
into a CSVtestClassifyModel.py
- test the Sentence Classification logic and Model for a given sentence (using the pre-built Random Forest RFmodel.ml
model)testGetAnswer.py
- test retreiving an Answer from the chatbot for given sentencetestGetGrammar.py
- extract grammar structure for given sentence (using Stanford CoreNLP package)testStoreStatement.py
- parse and store a given Statement sentence in the database.Details of installing dependancies for the NLPBot to function are documented in the python_server_config.md note in this repo.
In summary, for a new install, the following steps are required:
These steps are documented in detail here
[MySQL]
server: localhost
dbuser: nlpbot
dbname: nlpbot
dbcharset: utf8mb4
[Server]
listen_host: 0.0.0.0
tcp_socket: 9999
listen_queue: 10
[Java] #required for Stanford CoreNLP
bin: /usr/bin/java
[StanfordNLP]
corejar: /home/botuser/StanfordParser/coreNLP/stanford-corenlp-3.7.0.jar
modelsjar: /home/botuser/StanfordParser/coreNLP/stanford-corenlp-3.7.0-models.jar
[DEBUG]
assoc: False
weight: False
itemid: False
match: False
server: False
answer: False
The chatbot can be started with a multi-threaded server scheduler (botserver.py
) that listens for connections on a TCP port. This is a very simple "bare-bones" multi session framework with no authentication and just relying on TCP sockets for connection.
Remote TCP Socket Connection requests are given a thread and their own session connection.
The botserver gives each session a connection to the shared database server.
nohup python3.5 botserver.py &
Bot server output is logged to
~/NLPBot/log/botserver.log
$ ps -ef | grep botserver
botuser **22915** 22854 0 08:15 pts/0 00:00:01 python3.5 botserver.py
botuser 23165 22854 0 08:54 pts/0 00:00:00 grep botserver
kill -9 22915
python3.5 simpleclient.py -a localhost -p 9999
Make sure the botserver port is allowed through the firewall.
python3.5 simpleclient.py -a 192.168.10.101 -p 9999
This sample conversation shows the capabilities of the NLPBot ChatBot as of 1 May 2017. There are some good examples of limitations which are most noticeable around the very basic Question mode (looking for a previously learnt Statement). It does however generally determine whether it is "just chatting", being given a Statement to store or being asked a question.
$ python simpleclient.py -a 192.168.10.1 -p 1234
Connecting to server 192.168.10.1 at port 1234
>>> Hi
Bot> hello
>>> The latest Python source distribution is always available from python.org at https://www.python.org/downloads/
Bot> Shall I store that as a fact for future reference? ("yes" to store)
>>> yes
Thanks, I've made a note of that.
>>> There is a Python newsgroup comp.lang.python and a mailing list: python-list .
Bot> Shall I store that as a fact for future reference? ("yes" to store)
>>> yes
Bot> OK, I've stored that information.
>>> Where can I get the latest Python source code from?
Bot> Sorry, I can't find an answer to that.
>>> Where is the latest Python source distribution?
Bot> The latest Python source distribution is always available from python.org at https://www.python.org/downloads/
>>> Thanks.
Bot> How are you?
>>> I am fine thanks
Bot> How are you?
>>> Good
Bot> Please train me - enter a response for me to learn (or "skip" to skip)'
>>> skip
Bot> OK, moving on...
>>> Is there a Python newsgroup or mailing list?
Bot> There is a Python newsgroup comp.lang.python and a mailing list: python-list.
>>> quit
Connection closed