Conchylicultor / DeepQA

My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot
Apache License 2.0
2.93k stars 1.17k forks source link

how to test on my own dataset? #130

Open SeekPoint opened 7 years ago

SeekPoint commented 7 years ago

I had trained on my onw dataset like

python3 main.py --corpus lightweight --datasetTag qa_new_dataset_l2l

however, when I run: python3 main.py --corpus lightweight --datasetTag qa_new_dataset_l2l --test interactive

it looks still on conell dataset

python3 main.py --corpus lightweight --datasetTag qa_new_dataset_l2l --test interactive I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally Welcome to DeepQA v0.1 !

TensorFlow detected: v1.0.1

Warning: Restoring parameters: globStep: 37627 watsonMode: False autoEncode: False corpus: cornell datasetTag: maxLength: 10 filterVocab: 1 skipLines: False vocabularySize: 40000 hiddenSize: 512 numLayers: 2 softmaxSamples: 0 initEmbeddings: False embeddingSize: 64 embeddingSource: GoogleNews-vectors-negative300.bin

Loading dataset from /gruntdata/app_data/yike.yk/DeepQA/data/samples/dataset-cornell-length10-filter1-vocabSize40000.pkl Loaded cornell: 24643 words, 159653 QA Model creation...

EMCP commented 7 years ago

my suggestion is you make sure you clear out all the other data and models that it might pickup.. especially .pkl files. Secondly, try adding --modelTag qa_new_dataset_l2l , this will put the resulting model into a sub-folder that makes it clear you made a new model..

hoondongkim commented 6 years ago

if you did training before, you must use --reset option. if you do not, you will meet the "Warning: Restoring parameters:" message and your input argument will be ignore. Source Code explain the reason of warning message. that source code line is below.

if not self.args.reset and not self.args.createDataset and os.path.exists(configName):

EMCP commented 6 years ago

@PoojaPatel05 after you've successfully trained a model, it should drop the files into

DeepQA/save/model or sub-folders underneath that path.. depending on if you added some tag parameters when executing the training

also just check DeepQA/data/ sub-folders for .pkl files just in case

PoojaPatel05 commented 6 years ago

it is not taking my .txt file to train, which i have located in DeepQA/data/lightweight. It is not giving proper response.

EMCP commented 6 years ago

are you seeing the new model get generated? What command are you running exactly?

Also, how large is your dataset. I've noticed in using this implementation out of the box.. that if the data set is quite small relatively speaking.. the training will overfit quite quickly and give you back not so good results.

Another thing that can happen, is if you run --interactive on your first go.. the model doesn't actually train properly.. double check your steps and if you can.. wipe the model, redo your steps, and post what commands you do to this ticket

@PoojaPatel05

PoojaPatel05 commented 6 years ago

I have .txt file which include 80+ QA. Questions are one lined. but answers are long. but i have edited that in notepad++ and combine all the lines and make one lined answer. so the line of answer is long. and the data is in QA formate. one line is question and another line is answer. I have putted that file in DeepQA/data/lightweight. and run the code "python main. py --corpus lightweight --datasetTag dataqa" and then "python main. py --test --corpus lightweight --datasetTag dataqa" and then "python main. py --test interactive --corpus lightweight --datasetTag dataqa" another thing is after these 3 steps, the file named "model_predictions.txt" (DeepQA/save/model) do not have data which is in my dataqa.txt file. it contains another QA dataset. I dont know where it comes from. and the system gives response based on that data.

more than that, I have commented lines in DeepQA/chatbot/textdata.py from chatbot.corpus.cornelldata import CornellData from chatbot.corpus.opensubsdata import OpensubsData from chatbot.corpus.scotusdata import ScotusData from chatbot.corpus.ubuntudata import UbuntuData

('cornell', CornellData), ('opensubs', OpensubsData), ('scotus', ScotusData), ('ubuntu', UbuntuData),

than also it do not respond according to my data. help me.

On Sat, Nov 11, 2017 at 1:30 PM, Erik notifications@github.com wrote:

are you seeing the new model get generated? What command are you running exactly?

Also, how large is your dataset. I've noticed in using this implementation out of the box.. that if the data set is quite small relatively speaking.. the training will overfit quite quickly and give you back not so good results.

Another thing that can happen, is if you run --interactive on your first go.. the model doesn't actually train properly.. double check your steps and if you can.. wipe the model, redo your steps, and post what commands you do to this ticket

@PoojaPatel05 https://github.com/poojapatel05

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Conchylicultor/DeepQA/issues/130#issuecomment-343648031, or mute the thread https://github.com/notifications/unsubscribe-auth/AfFEgpi8-XYzzLDWVMvWyjPAzisgHcJ0ks5s1VQTgaJpZM4ORmT0 .

-- Pooja Patel Python Developer w: www.tagove.com https://u3476773.ct.sendgrid.net/wf/click?upn=AvVTWHpii7nfl55rPBWDlFzoSSG-2BuGg5dAeZQNDakXg-3D_ls62w2VqncErQRLLYqULEUYLnD2M4VQenauY8ZodonLGqFgeX4dvd6ZyrtANiPzAcOslZAPxp-2F98z04-2BFw1-2BtZloOEzXXgh7hLucUKp4pnbAGbkOIpKJ3gPrbhlBLqcVU3MIE3lGgOYZNFqbW6JnRK-2FNNfefFZGcowRXklQ-2B7Say56IAV18mPZWzgvAHlCkqkWaLCLU0Dy4pveYMkZ1NBAWQEvEompfUlqJvuV3vdUY-3D

PoojaPatel05 commented 6 years ago

I have tried with --modelTag aslo. and clearing previous data too. still it is not working. its taking another data

EMCP commented 6 years ago

80 questions is way too small with the default parameters I'd imagine..

I haven't done this yet, but I have had similar issues with my QA dataset.. the model here needs to be tweaked for smaller datasets

One thing you can try, is to download https://www.jetbrains.com/pycharm/download/ community edition for free. Put a break point anywhere in the code that loads the model, and you can inspect what is going on at runtime

Example : here I have opened DeepQA project inside the editor.. and added a breakpoint by clicking the line area.. and a red dot appears

I've added on inside chatbot.py as it loads the ModelParams

screen shot 2017-11-11 at 11 44 04 am

right click the main.py and choose debug 'main'

screen shot 2017-11-11 at 11 46 54 am

the line will light up.. and at the bottom is the list of runtime variables... inspect the variables to ensure they are as you expect (that it's got the correct model type, etc)

step through the code line by line with either F7 or F8 keys.. explore the codebase and you will learn a lot

EMCP commented 6 years ago

you should debug the code using the steps I outlined.. play around with setting breakpoints in stages of the codes execution and inspect the Variables section..

if the code is picking up a different model set, you'll see that in the Variables

if I get time I will try the lightweight corpus. I am assuming you're on a mac or linux machine?

PoojaPatel05 commented 6 years ago

I have created a text file in lightweight directory and run "python main.py" it should take my data (which includes long answer) to train with other data. may b it is training my data but is not giving proper response. I have not make any changes in code.