Closed windmaple closed 8 years ago
The dataset is present. From the error message, it seem's more a problem with nltk. Did you installed it using pip3 install nltk
?
Otherwise, as the message suggest, try to launch python and run
import nltk
nltk.download()
Then manually download the english support for nltk.
Ok, I was able to work it out by downloading 'punkt' through nltk.download(). Might be helpful to add that to README....
Encountered this error straight out of the box. It seems I need to download the corpus manually?
windmaple@windmaple-VirtualBox:~/DeepQA$ python3 main.py Welcome to DeepQA v0.1 !
TensorFlow detected: v0.11.0rc0 Training samples not found. Creating dataset... Extract conversations: 0%| | 0/83097 [00:00<?, ?it/s] Traceback (most recent call last): File "main.py", line 29, in
chatbot.main()
File "/home/windmaple/DeepQA/chatbot/chatbot.py", line 145, in main
self.textData = TextData(self.args)
File "/home/windmaple/DeepQA/chatbot/textdata.py", line 69, in init
self.loadCorpus(self.samplesDir)
File "/home/windmaple/DeepQA/chatbot/textdata.py", line 215, in loadCorpus
self.createCorpus(cornellData.getConversations())
File "/home/windmaple/DeepQA/chatbot/textdata.py", line 268, in createCorpus
self.extractConversation(conversation)
File "/home/windmaple/DeepQA/chatbot/textdata.py", line 283, in extractConversation
inputWords = self.extractText(inputLine["text"])
File "/home/windmaple/DeepQA/chatbot/textdata.py", line 300, in extractText
sentencesToken = nltk.sent_tokenize(line)
File "/usr/local/lib/python3.5/dist-packages/nltk/tokenize/init.py", line 90, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "/usr/local/lib/python3.5/dist-packages/nltk/data.py", line 801, in load
opened_resource = _open(resource_url)
File "/usr/local/lib/python3.5/dist-packages/nltk/data.py", line 919, in open
return find(path, path + ['']).open()
File "/usr/local/lib/python3.5/dist-packages/nltk/data.py", line 641, in find
raise LookupError(resource_not_found)
LookupError:
Resource 'tokenizers/punkt/PY3/english.pickle' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in: