This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better understand a domain (or topic) as it is represented on the Web.
I tried to start a crawler and DDT died (see log below).
I could only tell the crash had occurred because I checked the terminal. We should show users a message saying that the server died and that it should be re-started.
Note that this happened when the accuracy of the online classifier was displayed as being 0%.
But there were many relevant and irrelevant labeled pages.
New relevant samples 318
New irrelevant samples 497
[13/Jun/2017:03:27:27] HTTP
Request Headers:
Content-Length: 709
REFERER: http://0.0.0.0:8084/
HOST: 0.0.0.0:8084
ORIGIN: http://0.0.0.0:8084
CONNECTION: keep-alive
Remote-Addr: 172.17.0.1
ACCEPT: /
USER-AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
X-REQUESTED-WITH: XMLHttpRequest
ACCEPT-LANGUAGE: en-US,en;q=0.8
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
ACCEPT-ENCODING: gzip, deflate
[13/Jun/2017:03:27:27] HTTP
Traceback (most recent call last):
File "/opt/conda/envs/ddt/lib/python2.7/site-packages/cherrypy/_cprequest.py", line 670, in respond
response.body = self.handler()
File "/opt/conda/envs/ddt/lib/python2.7/site-packages/cherrypy/lib/encoding.py", line 220, in call
self.body = self.oldhandler(*args, *kwargs)
File "/opt/conda/envs/ddt/lib/python2.7/site-packages/cherrypy/_cpdispatch.py", line 60, in call
return self.callable(self.args, **self.kwargs)
File "/ddt/domain_discovery_API/server.py", line 310, in updateOnlineClassifier
return self._model.updateOnlineClassifier(session)
File "/ddt/domain_discovery_API/models/domain_discovery_model.py", line 1902, in updateOnlineClassifier
[traindata,] = self._onlineClassifiers[domainId]["onlineClassifier"].vectorize(pos_text+neg_text)
File "/ddt/domain_discovery_API/online_classifier/online_classifier.py", line 15, in vectorize
[Xtrain, , _] = self.tfidf_vector.tfidf(train)
File "/ddt/domain_discovery_API/online_classifier/tfidf_vector.py", line 18, in tfidf
[X_counts, features] = self.vectorize(data)
File "/ddt/domain_discovery_API/online_classifier/tf_vector.py", line 20, in vectorize
X_counts = self.count_vect.transform(data)
File "/opt/conda/envs/ddt/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 890, in transform
self._check_vocabulary()
File "/opt/conda/envs/ddt/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 278, in _check_vocabulary
check_isfitted(self, 'vocabulary', msg=msg),
File "/opt/conda/envs/ddt/lib/python2.7/site-packages/sklearn/utils/validation.py", line 690, in check_is_fitted
raise _NotFittedError(msg % {'name': type(estimator).name})
NotFittedError: CountVectorizer - Vocabulary wasn't fitted.
172.17.0.1 - - [13/Jun/2017:03:27:27] "POST /updateOnlineClassifier HTTP/1.1" 500 2636 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
Not enough data for calibration
172.17.0.1 - - [13/Jun/2017:03:27:28] "POST /updateOnlineClassifier HTTP/1.1" 200 21 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:03:27:29] "POST /getPages HTTP/1.1" 200 4860 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:03:27:29] "POST /getPages HTTP/1.1" 200 4860 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
Using default negative tags
/ddt/run_ddt: line 33: 102 Killed python $DDT_HOME/server/server.py
Stopping elastisearch container
elastic
Removing elastisearch container
elastic
Stopping DD Tool container
dd_tool
Removing DD Tool container
dd_tool
Julianas-MacBook-Pro-2:Downloads juliana$
I tried to start a crawler and DDT died (see log below).
I could only tell the crash had occurred because I checked the terminal. We should show users a message saying that the server died and that it should be re-started.
Note that this happened when the accuracy of the online classifier was displayed as being 0%. But there were many relevant and irrelevant labeled pages.
New relevant samples 318 New irrelevant samples 497
[13/Jun/2017:03:27:27] HTTP Request Headers: Content-Length: 709 REFERER: http://0.0.0.0:8084/ HOST: 0.0.0.0:8084 ORIGIN: http://0.0.0.0:8084 CONNECTION: keep-alive Remote-Addr: 172.17.0.1 ACCEPT: / USER-AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 X-REQUESTED-WITH: XMLHttpRequest ACCEPT-LANGUAGE: en-US,en;q=0.8 Content-Type: application/x-www-form-urlencoded; charset=UTF-8 ACCEPT-ENCODING: gzip, deflate [13/Jun/2017:03:27:27] HTTP Traceback (most recent call last): File "/opt/conda/envs/ddt/lib/python2.7/site-packages/cherrypy/_cprequest.py", line 670, in respond response.body = self.handler() File "/opt/conda/envs/ddt/lib/python2.7/site-packages/cherrypy/lib/encoding.py", line 220, in call self.body = self.oldhandler(*args, *kwargs) File "/opt/conda/envs/ddt/lib/python2.7/site-packages/cherrypy/_cpdispatch.py", line 60, in call return self.callable(self.args, **self.kwargs) File "/ddt/domain_discovery_API/server.py", line 310, in updateOnlineClassifier return self._model.updateOnlineClassifier(session) File "/ddt/domain_discovery_API/models/domain_discovery_model.py", line 1902, in updateOnlineClassifier [traindata,] = self._onlineClassifiers[domainId]["onlineClassifier"].vectorize(pos_text+neg_text) File "/ddt/domain_discovery_API/online_classifier/online_classifier.py", line 15, in vectorize [Xtrain, , _] = self.tfidf_vector.tfidf(train) File "/ddt/domain_discovery_API/online_classifier/tfidf_vector.py", line 18, in tfidf [X_counts, features] = self.vectorize(data) File "/ddt/domain_discovery_API/online_classifier/tf_vector.py", line 20, in vectorize X_counts = self.count_vect.transform(data) File "/opt/conda/envs/ddt/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 890, in transform self._check_vocabulary() File "/opt/conda/envs/ddt/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 278, in _check_vocabulary check_isfitted(self, 'vocabulary', msg=msg), File "/opt/conda/envs/ddt/lib/python2.7/site-packages/sklearn/utils/validation.py", line 690, in check_is_fitted raise _NotFittedError(msg % {'name': type(estimator).name}) NotFittedError: CountVectorizer - Vocabulary wasn't fitted. 172.17.0.1 - - [13/Jun/2017:03:27:27] "POST /updateOnlineClassifier HTTP/1.1" 500 2636 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
Not enough data for calibration
172.17.0.1 - - [13/Jun/2017:03:27:28] "POST /updateOnlineClassifier HTTP/1.1" 200 21 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:03:27:29] "POST /getPages HTTP/1.1" 200 4860 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:03:27:29] "POST /getPages HTTP/1.1" 200 4860 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" Using default negative tags /ddt/run_ddt: line 33: 102 Killed python $DDT_HOME/server/server.py Stopping elastisearch container elastic Removing elastisearch container elastic Stopping DD Tool container dd_tool Removing DD Tool container dd_tool Julianas-MacBook-Pro-2:Downloads juliana$