download corpus? - Githubissues

windmaple commented 8 years ago

Encountered this error straight out of the box. It seems I need to download the corpus manually?

windmaple@windmaple-VirtualBox:~/DeepQA$ python3 main.py Welcome to DeepQA v0.1 !

TensorFlow detected: v0.11.0rc0 Training samples not found. Creating dataset... Extract conversations: 0%| | 0/83097 [00:00<?, ?it/s] Traceback (most recent call last): File "main.py", line 29, in chatbot.main() File "/home/windmaple/DeepQA/chatbot/chatbot.py", line 145, in main self.textData = TextData(self.args) File "/home/windmaple/DeepQA/chatbot/textdata.py", line 69, in init self.loadCorpus(self.samplesDir) File "/home/windmaple/DeepQA/chatbot/textdata.py", line 215, in loadCorpus self.createCorpus(cornellData.getConversations()) File "/home/windmaple/DeepQA/chatbot/textdata.py", line 268, in createCorpus self.extractConversation(conversation) File "/home/windmaple/DeepQA/chatbot/textdata.py", line 283, in extractConversation inputWords = self.extractText(inputLine["text"]) File "/home/windmaple/DeepQA/chatbot/textdata.py", line 300, in extractText sentencesToken = nltk.sent_tokenize(line) File "/usr/local/lib/python3.5/dist-packages/nltk/tokenize/init.py", line 90, in sent_tokenize tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language)) File "/usr/local/lib/python3.5/dist-packages/nltk/data.py", line 801, in load opened_resource = _open(resource_url) File "/usr/local/lib/python3.5/dist-packages/nltk/data.py", line 919, in open return find(path, path + ['']).open() File "/usr/local/lib/python3.5/dist-packages/nltk/data.py", line 641, in find raise LookupError(resource_not_found) LookupError:

Resource 'tokenizers/punkt/PY3/english.pickle' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in:

'/home/windmaple/nltk_data'
'/usr/share/nltk_data'
'/usr/local/share/nltk_data'
'/usr/lib/nltk_data'
'/usr/local/lib/nltk_data'
''

Conchylicultor commented 8 years ago

The dataset is present. From the error message, it seem's more a problem with nltk. Did you installed it using pip3 install nltk ? Otherwise, as the message suggest, try to launch python and run

import nltk
nltk.download()

Then manually download the english support for nltk.

windmaple commented 8 years ago

Ok, I was able to work it out by downloading 'punkt' through nltk.download(). Might be helpful to add that to README....

Conchylicultor / DeepQA

download corpus? #11