AbrahamSanders / seq2seq-chatbot

A sequence2sequence chatbot implementation with TensorFlow.
MIT License
99 stars 56 forks source link

Cloud Training and Learning #2

Closed sagarhingal closed 5 years ago

sagarhingal commented 6 years ago
  1. How can i deploy this code into the cloud to train faster, like google's compute engine or something?
  2. Have you thought of implementing on-the-go learning? Like updating the weights after each session? If not, any thoughts on that?
AbrahamSanders commented 6 years ago

Hey @sagarhingal,

  1. I personally have not done this as I use local GPUs for training. However I have looked at some of the cloud options.

On Microsoft Azure you can rent time on a "Deep Learning Virtual Machine", where you actually remote into a VM and execute code directly with Anaconda.

On google cloud, instructions are here, although you need to add support to the code for google cloud training jobs and they are executed and monitored over a command line.

  1. Yes I have considered online learning and placed it in the roadmap.

I am currently focused on trying to figure out how to objectively (and automatically) evaluate a chatbot model's conversational quality in order to compare it to other models. I feel this is the most important problem to solve first, since it will allow the following:

The current validation metric being used (Cross-entropy loss) does not accurately reflect the conversational capabilities of the bot. I have found that lower losses (< 1) yield very rigid models that yield mostly generic responses, while losses between 1 and 2 are usually better. However this "gut feel" metric is not an acceptable scientific way to measure the performance of a model. So, I am reading lots of research papers on neural machine translation and neural conversational models to see what metrics are being used by researchers at the forefront of the field. This was seen as an unsolved problem as of Google's 2015 paper A Neural Conversational Model. Although it is now 2018 and I am positive maybe someone has solved it since then.

Once this problem is solved I will move on to online learning and other things, but feel free to fork the repository and get a head-start ;-)

sagarhingal commented 6 years ago

okay. I just wanted to know, for how many epochs you ran the "v1-trained model" because I didn't get any good results? Also have you updated the data set or sticking with the same? (p.s thanks for the answer)

AbrahamSanders commented 6 years ago

Somewhere between 100 and 200 complete epochs (I don't have the exact number since the model I posted was one selected from a checkpoint in the middle of training where the loss was between 1 and 1.5). Training took several days on an nvidia Titan V.

How many epochs did you train your model for (or did you mean the v1 model didn't give any good results)?

The dataset I used was the original cornell movie dialog dataset used by the Udemy course, although I have implemented a generic pattern to easily bring in other datasets (just implement another DatasetReader subclass). I think it is very important to experiment with different datasets... although as I mentioned above, until there is an objective and accurate way to evaluate the performance of the model then those experiments would be good for entertainment value only.

sagarhingal commented 6 years ago

yeah....the V1 model gave me below average results, and I cannot tweak the parameters and test it because of the hardware. I checked Azure (too expensive), Google too. Is there any source where there are pre-trained models so that I can just test the hyper-parameters, not everybody has Titan :P

AbrahamSanders commented 6 years ago

I am planning to publish more pre-trained models in the weeks to come.

Can you share your chat log with the below average results? (access it from models\cornell_movie_dialog\trained_model_v1). Also, what are you comparing the dialog quality to (human dialog, another chatbot, published research results, etc.) when you determine it is below average? This information could help me with the research I am doing now on objective dialog metrics.

For now there are some tweaks you can make to the inference hyperparameters.

  1. In the chat window, use the --convhistlength setting to shorten (or lengthen) the number of question-reply pairs that are prepended as context to each question. By default it is set to 6, but you can try changing it to 2 or 4 and see if that improves your dialog. Ex. --convhistlength=4

  2. In the chat window, use the --beamlenpenalty setting to influence the beam search ranking algorithm... higher positive values result in longer responses and lower negative values result in shorter responses. By default it is set to 1.25, but you can try setting it between 0 and 0.75. Ex. --beamlenpenalty=0.5

  3. Chat more like a natural conversation rather than an interrogation. The cornell movie dialog dataset is mostly conversational speech, so the bot won't do so well on rote Q&A tasks. Here are some good examples from my chatlog:

---------------------------------------- Example 1: ---------------------------------------- You: hi ChatBot: how is it going?

You: good, and you? ChatBot: great.

You: I am tired today. ChatBot: oh hi that is good.

You: what about you? ChatBot: I do not know.

You: I'm at work right now. ChatBot: yeah?

You: yup. where are you? ChatBot: I am pregnant.

You: congrats! when did you find out? ChatBot: yes I said it is saturday

You: Wow! ChatBot: I am sorry.

You: why are you sorry? ChatBot: I am fine.

You: what's bothering you? ChatBot: what are you talking about?

You: you are acting strange. ChatBot: what the hell is that supposed to mean?

You: nothing. ChatBot: that is exactly what I am feeling. ----------------------------------------------- Example 2: ----------------------------------------------- You: Good morning ChatBot: you have got to leave.

You: Are you going to make me leave ChatBot: now.

You: I refuse ChatBot: I do not want to be.

You: Tell me why I need to leave ChatBot: because I am not alone. ----------------------------------------------- Example 3: ----------------------------------------------- You: what is your name? ChatBot: I am not sure.

You: my name is John. ChatBot: she is not here.

You: who is she? ChatBot: a woman.

You: really? ChatBot: yeah.

You: ok then where is she? ChatBot: I do not know.

sagarhingal commented 6 years ago

Example 1 (without your suggested tweaks) You: hello ChatBot: how is it going?

You: excellent! ChatBot: I am sorry. are you ok?

You: im absolutely fine ChatBot: sit down. what are you doing in my bed?

You: I don't know ChatBot: where are you?

You: In hospital ChatBot: and is that right?

You: no ChatBot: why are you doing this?

You: Because i hate you ChatBot: because?

You: i hate you ChatBot: and who are you

You: your master ChatBot: if I am talking to you now.

You: --goodbye ChatBot: who told me to bother?

Example 2 (with tweaks) You: hey ChatBot: what?

You: how are you ChatBot: you do not know me.

You: okay ChatBot: I do not.

You: Are you ok ChatBot: I am not married.

You: very happy to know that ChatBot: do not be ridiculous.

You: what happen ChatBot: it is time for me.

You: yes i agree ChatBot: tell me what

You: what do you wany ChatBot: I mean I do not care.

You: okay got it thanks ChatBot: look!

You: yes ChatBot: tell me about it.

You: I have nothing to say now ChatBot: okay.

You: cool ChatBot: do not be too sure.

You: okay ChatBot: . I am not crazy.

You: yes i know that ChatBot: theres nothing to talk about.

I think im going to play more with the tweaks for a while

  1. So the next batch of models that you are going to upload, they are with diff hyper-parameters right, i think you can reduce the batch size and increase the rnn_layer size from 512 to 1024, learning_rate to 0.0001, something like that. I noticed you kept the learning rate higher and the and rnn_layer size smaller?

  2. One question I had. Can i train the model using cluster of machines? like using hadoop or kubernetes or something?

AbrahamSanders commented 6 years ago

Thanks for sharing the chatlogs. If you have used another neural network based chatbot or read a research paper yielding better conversational results, please send me the link to aid in my research on objective conversation quality metrics.

In response to your questions:

  1. The rnn_size for the v1 model was 1024, not 512. Check the hparams.json file in the trained_model_v1 folder. The values in the hparams.py file are always overwritten with the values from the json file. When training a new model, the json file is copied from the main seq2seq-chatbot folder to the specific model folder (to allow multiple models with different hparams to be used without changing any files).

The learning rate is higher because it is using regular SGD optimizer with a learning rate decay schedule (instead of Adam or RMSProp). This approach is similar to Google's NMT model hparams

Based on my observations I think that the best place to look to improve the model is by improving the word embeddings. I noticed that the embeddings trained on the cornell dataset are extremely densely clustered with the exception of a few very commonly used words (I used PCA projection with TensorBoard projector on the checkpoint file to observe this). This means that the model will receive very similar input vectors for a vast majority of the words in the vocabulary, which can degrade the ability to understand the questions. This is probably a symptom of not having enough word usage examples in the dataset.

I think the best improvements will come from re-training using pre-trained word embeddings such as Word2Vec or GloVe. The new models I plan to upload will be trained using this approach (pre-trained embeddings), with the same layer count & size as before. I will also release models trained on other datasets.

  1. To train using a cluster or on parallel GPUs some additional changes to the model are necessary. I did not implement these for now. See here.

Thanks!

sagarhingal commented 6 years ago

Okay. Looking forward to your next upload. Thanks!

AbrahamSanders commented 6 years ago

@sagarhingal I just updated the code and uploaded a new trained model (trained_model_v2). Get latest and download the new model - let me know what you think.

Thanks!

sagarhingal commented 6 years ago

@AbrahamSanders Cool, will test it.

AbrahamSanders commented 6 years ago

@sagarhingal Did you get around to testing the new model? If so do you have any feedback?

Thanks!

sagarhingal commented 6 years ago

Hey @AbrahamSanders , yes I tested at that day only, tested it today as well. I'm attaching the log files.

  1. I am facing problems in running the web interface, can you provide me the steps to run the server
  2. I see some improvements but its very hard to examine the output because of the dataset.

Should we change the data set, or make a data set of our own, maybe some specific domain? what you think?

chatlog_20180805_144649.txt chatlog_20180831_143552.txt

AbrahamSanders commented 6 years ago

Thanks @sagarhingal for the log files. Would you be ok with me adding them to the repository here?

1. I am facing problems in running the web interface, can you provide me the steps to run the server

To run the web server, make sure you have flask 0.12.4 and flask-restful installed, then: On windows: a) run chat_web_best_weights_training.bat from the model directory

On Mac / Linux: a) Open anaconda prompt to seq2seq-chatbot directory b) Run:

set FLASK_APP=chat_web.py
flask serve_chat "models\cornell_movie_dialog\trained_model_v2\best_weights_training.ckpt" -p 8080

Once the web server is running, you can open a browser to http://localhost:8080/chat_ui.html

2. I see some improvements but its very hard to examine the output because of the dataset.

Can you elaborate on the difficulty you are having examining the output because of the dataset?

Keep in mind that since this is a generalized dialog dataset you cannot expect the bot to always spit out the exact lines from the dataset - rather it learns an approximation of the rules of the language as demonstrated by the examples it sees during training. The best way to evaluate the bot is to determine if its' responses make sense in the context of the question and the history of the conversation.

For example, in your chatlog:

You: you are very smart my friend
ChatBot: can I talk to you about it?

You: yes please
ChatBot: but I don't think I am?

These responses make a lot of sense in the context of the conversation, but you probably cannot find the exact exchange word-for-word in the original cornell dataset.

If we were evaluating the bot on a smaller, domain-specific dataset, i.e. Sales Q&A about a specific product, then we would want to evaluate the factual correctness of the bot as well and we would expect the responses to be closer to the originals from the dataset - however, we would restrict the question domain of the bot to only those relevant to the product and would not allow general chit-chat.

I would like to train the bot on datasets other than cornell, including some domain-specific ones. If you have suggestions for datasets to use, let me know!

sagarhingal commented 6 years ago

Sure @AbrahamSanders, go ahead with the logs, will continue to test and contribute if I could.

About the domain, one sector I myself was doing some research was Recruiting. There is one space where a good chatbot can speed up the process but the only crucial thing is the dataset.

One question I had is that what if I want to create my own dataset, in a easy way, is there any tool that you are aware of, or the only way is to manually scrape online.

Also I'm yet to study your whole code, as it is quite modular. Give me some time.

Last thing, I get some warnings when I run the code due to some floating point conversions which are deprecated, some version mismatch thing. Can you specify the version of python, tensorflow?

Also I'm running mac for now, but soon will be building a PC. What are your suggestions on the OS, Windows or Linux? Because I've read its better to manually compile to specific CPU instruction set for faster performance.

screen shot 2018-09-01 at 1 11 27 am
AbrahamSanders commented 6 years ago

Thanks @sagarhingal I will add the logs.

Recruiting is an interesting field to apply chatbots. What kind of use cases do you have in mind?

On creating a dataset - there is no tool that I know of for assembly of a dialog dataset. It really depends on where your data lives. If it is on twitter, you can script against the twitter API and then assemble it into a text or CSV file... Same goes for other social sites like Reddit. If there is no API you will have to scrape. I actually built a web scraping tool which you can use for scriptless web scraping and can export scraped data to XML or insert it into a database. It is windows only so you will need to run a windows VM on your mac to use it.

On versions: Python 3.6, Tensorflow 1.9 (or latest, assuming there are no breaking changes).

On OS: My prefered OS is windows just because that's what I'm used to. If you are concerned about squeezing the most performance out as you possibly can, perhaps Linux is better... I wouldn't worry about it unless you are ready to roll something into production at a large scale. Also, GPU acceleration is much faster than CPU. I have not tried compiling for AVX, etc. but I doubt it will be faster than a GPU.

sagarhingal commented 6 years ago

@AbrahamSanders if you are aware of the initial process where recruiters need to gather data on the candidate through natural conversation, e.g they call and enquire, this part can be automated.

On the dataset part, we can't use twitter as the conversations use a proper format and a formal way. Even I built one web scrapper myself (was thinking of building a knowledge base through wikipedia) but can be tweaked for different sites offcourse.

On OS front, I will be using 1080ti (offcourse GPU will be much faster) but I guess I can use as much as performance I can get from my machine.

(P.S have you tried any other algorithm or technique like reinforcement learning, apart from seq2seq)

AbrahamSanders commented 6 years ago

Yes I am familiar with the way recruiters gather data about candidates. I assume the goal would be to insert this data into a database. What aspects of this do you think require machine learning and cannot be handled by a traditional rule-based bot engine?

If a seq2seq model is to be used for a task like this, the dataset would need to include special tokens that a program can understand as an instruction to read or write data from/to a database.

For example:

Bot: Did you go to college? 

User: Yes
Bot: { write: attended_college; bool; True }
Bot: Which university did you attend? 

User: I went to Stanford
Bot: { write: college_name; string; "Stanford" }
Bot: What did you study at { read: college_name }?

User: Electrical Engineering
Bot: { write: major; string; "Electrical Engineering" }

There are many ways the same exact information can be conveyed. Consider the following alternative:

Bot: Did you go to college?

User: Yes I graduated from Stanford with a Bachelors in Electrical Engineering.
Bot: { write: attended_college; bool; True, college_name; string; "Stanford", major; string; "Electrical Engineering" }

Since many permutations need to be supported, a large diverse dataset would need to be constructed where the same question is answered in many different ways. I think there are two ways to go about it:

1) Partner with a recruiting firm that is willing to provide this data (and presumably buy the finished chatbot) 2) Try to come up with all of these permutations yourself and test incrementally (could take a very long time)

I don't think there is a public dataset of recruiter dialog with candidates, considering the personal nature of these conversations.

On other techniques: I think there are many avenues to explore to continue to develop better conversational models and reinforcement learning is definitely one of them. I don't see it as an alternative to seq2seq but rather a different way of training it. RL depends on an external reward signal to train (did a user like the response I just gave?) instead of a supervised error signal (how close is my response to the one in the dataset?) I have not experimented with RL yet but I see it as a critical part of any successful AI system. I hope to take Hadelin & Kirill's new reinforcement learning course on Udemy this year.

Also check out the roadmap page to see other ideas I have to continue improving the chatbot model by incorporating more recent research in deep NLP.

On another note, a quick google search revealed that this company appears to already be doing exactly what you are suggesting:

STEP 3 — Engaging with the chatbot

Candidates engage with the chatbot specifically designed for your business. Our system automatically synchronizes the data with your ATS.

sagarhingal commented 5 years ago

@AbrahamSanders actually there are many companies which provides that "recruiter" service, but building a general chatbot is very tough, because there are limitless permutations on topics.

Reason I don't want to go with rule-based engine is that kind of system isn't that natural in conversations. Coding different types of response and then assigning it to the queries randomly or sequentially is too machine-like. Currently, this chatbot is generating sentences by its own from the dataset which is what should be the target. I was wondering how we could develop a general conversational chatbot, have some ideas. Lets discuss this in detail on some other platform if you want to.

Went through the roadmap you provided, I think you should target the live training point, because it save you a lot of time in training.

sagarhingal commented 5 years ago

@AbrahamSanders having some issues on mac with flask - Can't find "serve_chat" command

screen shot 2018-09-03 at 8 10 31 pm
AbrahamSanders commented 5 years ago

Hey @sagarhingal -

Sure you can email me at abraham.sanders@gmail.com if you would like to discuss your ideas.

For the flask issue - what version are you using? I had issues with the latest version of flask, I am using 0.12.4. Also make sure to have your anaconda environment in context in your console and that the console working directory is set to the same directory that contains the file chat_web.py.

Let me know if this works. If so I will close this issue and we can continue our conversation via email.

AbrahamSanders commented 5 years ago

Hey @sagarhingal, since there is no response on this issue for 10 weeks, I am closing the issue. Feel free to reach out at my email if you ever want to discuss your idea or if you have any questions.

Thanks!

sagarhingal commented 5 years ago

Hey Abraham,

Sorry for not responding, got stuck in some things. I've some ideas, will discuss one day. Thanks

On Tue, Nov 13, 2018, 9:51 AM Avi Sanders <notifications@github.com wrote:

Hey @sagarhingal https://github.com/sagarhingal, since there is no response on this issue for 10 weeks, I am closing the issue. Feel free to reach out at my email if you ever want to discuss your idea or if you have any questions.

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AbrahamSanders/seq2seq-chatbot/issues/2#issuecomment-438130743, or mute the thread https://github.com/notifications/unsubscribe-auth/APRmnW6eIzpFNseJwRX017a28zCwbxKBks5uukjNgaJpZM4T_OhK .