Conchylicultor / DeepQA

My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot
Apache License 2.0
2.93k stars 1.17k forks source link

Error with ubuntu corpus #103

Open drophit opened 7 years ago

drophit commented 7 years ago

D:\drophit\Documents\TensorFlowChatBots\3.5PythonTensor1.0-DeepQABot>python main.py --corpus ubuntu --modelTag ubuntu Welcome to DeepQA v0.1 !

TensorFlow detected: v1.1.0 Training samples not found. Creating dataset... Constructing full dataset... Ubuntu dialogs subfolders: 0%| | 0/350 [00:00<?, ?it/s] Traceback (most recent call last): File "main.py", line 29, in chatbot.main() File "D:\drophit\Documents\TensorFlowChatBots\3.5PythonTensor1.0-DeepQABot\chatbot\chatbot.py", line 158, in main self.textData = TextData(self.args) File "D:\drophit\Documents\TensorFlowChatBots\3.5PythonTensor1.0-DeepQABot\chatbot\textdata.py", line 97, in init self.loadCorpus() File "D:\drophit\Documents\TensorFlowChatBots\3.5PythonTensor1.0-DeepQABot\chatbot\textdata.py", line 260, in loadCorpus corpusData = TextData.availableCorpus[self.args.corpus](self.corpusDir + optional) File "D:\drophit\Documents\TensorFlowChatBots\3.5PythonTensor1.0-DeepQABot\chatbot\corpus\ubuntudata.py", line 49, in init self.conversations.append({"lines": self.loadLines(f.path)}) File "D:\drophit\Documents\TensorFlowChatBots\3.5PythonTensor1.0-DeepQABot\chatbot\corpus\ubuntudata.py", line 61, in loadLines for line in f: File "C:\Users\drophit\AppData\Local\Programs\Python\Python35\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 174: character maps to

EMCP commented 7 years ago

i get a similar error using the default corpus.. my suspicion is that this is related to the OS or python 3 https://stackoverflow.com/questions/23917729/switching-to-python-3-causing-unicodedecodeerror .. have you tried running this in Linux or OS X ?

Also it helps to enclose your terminal output in backticks, like so :

C:\Users\emcp\Dev\github\Conchylicultor\DeepQA>python main.py --test
Welcome to DeepQA v0.1 !

TensorFlow detected: v1.3.0

Warning: Restoring parameters:
globStep: 18912
watsonMode: False
autoEncode: False
corpus: cornell
datasetTag:
maxLength: 10
filterVocab: 1
skipLines: False
vocabularySize: 40000
hiddenSize: 512
numLayers: 2
softmaxSamples: 0
initEmbeddings: False
embeddingSize: 64
embeddingSource: GoogleNews-vectors-negative300.bin

Loading dataset from C:\Users\emcp\Dev\github\Conchylicultor\DeepQA\data/samples/dataset-cornell-length10-filter1-vocabSize40000.pkl
Loaded cornell: 24643 words, 159657 QA
Model creation...
2017-09-04 09:56:57.844120: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-04 09:56:57.844359: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-04 09:56:59.427558: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:955] Found device 0 with properties:
name: Quadro M1000M
major: 5 minor: 0 memoryClockRate (GHz) 1.0715
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.65GiB
2017-09-04 09:56:59.427759: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:976] DMA: 0
2017-09-04 09:56:59.430359: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:986] 0:   Y
2017-09-04 09:56:59.430955: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M1000M, pci bus id: 0000:01:00.0)
Initialize variables...
Start predicting...
Restoring previous model from C:\Users\emcp\Dev\github\Conchylicultor\DeepQA\save/model\model.ckpt
Testing...
Sentences:   9%|#################6                                                                                                                                                                         | 31/328 [00:01<00:13, 21.54it/s]
Traceback (most recent call last):
  File "main.py", line 29, in <module>
    chatbot.main()
  File "C:\Users\emcp\Dev\github\Conchylicultor\DeepQA\chatbot\chatbot.py", line 207, in main
    self.predictTestset(self.sess)
  File "C:\Users\emcp\Dev\github\Conchylicultor\DeepQA\chatbot\chatbot.py", line 310, in predictTestset
    f.write(predString)
  File "C:\Users\emcp\AppData\Local\Programs\Python\Python35\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x97' in position 45: character maps to <undefined>

C:\Users\emcp\Dev\github\Conchylicultor\DeepQA>
EMCP commented 7 years ago

@drophit so I added some fixes for windows pathes, this solved the issue for me.. perhaps try this PR and it will work for you as well

https://github.com/Conchylicultor/DeepQA/pull/144