VIDA-NYU / domain_discovery_tool

This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better understand a domain (or topic) as it is represented on the Web.
http://domain-discovery-tool.readthedocs.io/en/latest/index.html
GNU General Public License v3.0
47 stars 18 forks source link

issues with running the crawler #32

Closed julianafreire closed 6 years ago

julianafreire commented 7 years ago

I started the crawler, it ran for a few minutes and then stopped retrieving new pages. I labeled additional pages and started the crawler again. No new pages were retrieved. To check what was happening, I tried to use the crawler monitor, but got the following:

This page isn’t working

localhost didn’t send any data. ERR_EMPTY_RESPONSE

julianafreire commented 7 years ago

I re-started DDT and tried to run the crawler. When I clicked on the crawler monitor, DDT crashed. See log below.

Accuracy = 88.0 %

172.17.0.1 - - [13/Jun/2017:04:09:13] "POST /updateOnlineClassifier HTTP/1.1" 200 24 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:04:09:14] "POST /getStatus HTTP/1.1" 200 2 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" Using default negative tags Seeds path /ddt/domain_discovery_tool_react/server/data/research_on_spatio-temporal_data/seeds.txt EXCEPTION IN (/ddt/domain_discovery_API/models/domain_discovery_model.py, LINE 1729 "with open(file_positive, 'w') as f:"): [Errno 36] File name too long: u'/ddt/domain_discovery_tool_react/server/data/research_on_spatio-temporal_data/training_data/positive/http%3A%2F%2Fsearch.ebscohost.com%2Flogin.aspx%3Fdirect%3Dtrue%26profile%3Dehost%26scope%3Dsite%26authtype%3Dcrawler%26jrnl%3D19352727%26AN%3D111783097%26h%3DfO9L1E%252BFUwJmf6UoFl8%252BVw6qK95XbzwTlQbH%252Bmu7Hn2vInbZmzpUmy1i9PG12iR4b7jTq0VBX9IrfVD4MQyesg%253D%253D%26crl%3Dc'

ACHE Crawler 0.9.0-SNAPSHOT

Preparing training data... POSITIVE:510 NEGATIVE:616 /ddt/run_ddt: line 33: 102 Killed python $DDT_HOME/server/server.py Stopping elastisearch container elastic Removing elastisearch container elastic Stopping DD Tool container dd_tool Removing DD Tool container dd_tool Julianas-MacBook-Pro-2:Downloads juliana$

julianafreire commented 7 years ago

I tried again, and DDT crashes even if I don't click on the crawler monitor.

Could it be that there are too many examples?


ACHE Crawler 0.9.0-SNAPSHOT

Preparing training data... POSITIVE:510 NEGATIVE:616 /ddt/run_ddt: line 33: 102 Killed python $DDT_HOME/server/server.py Stopping elastisearch container elastic Removing elastisearch container elastic Stopping DD Tool container dd_tool Removing DD Tool container dd_tool

yamsgithub commented 6 years ago

This is no longer relevant as we are using the ACHE server