VIDA-NYU / domain_discovery_tool

This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better understand a domain (or topic) as it is represented on the Web.
http://domain-discovery-tool.readthedocs.io/en/latest/index.html
GNU General Public License v3.0
47 stars 18 forks source link

DDT crash #30

Closed julianafreire closed 6 years ago

julianafreire commented 7 years ago

I tried to start a crawler and DDT died (see log below).

I could only tell the crash had occurred because I checked the terminal. We should show users a message saying that the server died and that it should be re-started.

Note that this happened when the accuracy of the online classifier was displayed as being 0%. But there were many relevant and irrelevant labeled pages.


New relevant samples 318 New irrelevant samples 497

[13/Jun/2017:03:27:27] HTTP Request Headers: Content-Length: 709 REFERER: http://0.0.0.0:8084/ HOST: 0.0.0.0:8084 ORIGIN: http://0.0.0.0:8084 CONNECTION: keep-alive Remote-Addr: 172.17.0.1 ACCEPT: / USER-AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 X-REQUESTED-WITH: XMLHttpRequest ACCEPT-LANGUAGE: en-US,en;q=0.8 Content-Type: application/x-www-form-urlencoded; charset=UTF-8 ACCEPT-ENCODING: gzip, deflate [13/Jun/2017:03:27:27] HTTP Traceback (most recent call last): File "/opt/conda/envs/ddt/lib/python2.7/site-packages/cherrypy/_cprequest.py", line 670, in respond response.body = self.handler() File "/opt/conda/envs/ddt/lib/python2.7/site-packages/cherrypy/lib/encoding.py", line 220, in call self.body = self.oldhandler(*args, *kwargs) File "/opt/conda/envs/ddt/lib/python2.7/site-packages/cherrypy/_cpdispatch.py", line 60, in call return self.callable(self.args, **self.kwargs) File "/ddt/domain_discovery_API/server.py", line 310, in updateOnlineClassifier return self._model.updateOnlineClassifier(session) File "/ddt/domain_discovery_API/models/domain_discovery_model.py", line 1902, in updateOnlineClassifier [traindata,] = self._onlineClassifiers[domainId]["onlineClassifier"].vectorize(pos_text+neg_text) File "/ddt/domain_discovery_API/online_classifier/online_classifier.py", line 15, in vectorize [Xtrain, , _] = self.tfidf_vector.tfidf(train) File "/ddt/domain_discovery_API/online_classifier/tfidf_vector.py", line 18, in tfidf [X_counts, features] = self.vectorize(data) File "/ddt/domain_discovery_API/online_classifier/tf_vector.py", line 20, in vectorize X_counts = self.count_vect.transform(data) File "/opt/conda/envs/ddt/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 890, in transform self._check_vocabulary() File "/opt/conda/envs/ddt/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 278, in _check_vocabulary check_isfitted(self, 'vocabulary', msg=msg), File "/opt/conda/envs/ddt/lib/python2.7/site-packages/sklearn/utils/validation.py", line 690, in check_is_fitted raise _NotFittedError(msg % {'name': type(estimator).name}) NotFittedError: CountVectorizer - Vocabulary wasn't fitted. 172.17.0.1 - - [13/Jun/2017:03:27:27] "POST /updateOnlineClassifier HTTP/1.1" 500 2636 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

Not enough data for calibration

172.17.0.1 - - [13/Jun/2017:03:27:28] "POST /updateOnlineClassifier HTTP/1.1" 200 21 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:03:27:29] "POST /getPages HTTP/1.1" 200 4860 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:03:27:29] "POST /getPages HTTP/1.1" 200 4860 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" Using default negative tags /ddt/run_ddt: line 33: 102 Killed python $DDT_HOME/server/server.py Stopping elastisearch container elastic Removing elastisearch container elastic Stopping DD Tool container dd_tool Removing DD Tool container dd_tool Julianas-MacBook-Pro-2:Downloads juliana$

yamsgithub commented 6 years ago

This issue no longer occurs with all the recent changes.