This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better understand a domain (or topic) as it is represented on the Web.
I tried to use the seed finder with the keywords: political news
Then DDT said:
Query failed. Try Bing.
But under the SeedFinder, there is no option to switch the search engine.
I went back to the Search Tab and selected Bing. Then I re-started the SeedFinder and DDT crashed. Log attached below.
172.17.0.1 - - [13/Jun/2017:18:37:58] "POST /updateOnlineClassifier HTTP/1.1" 200 25 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:38:06] "POST /getPages HTTP/1.1" 200 1192 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
Using default negative tags
172.17.0.1 - - [13/Jun/2017:18:38:57] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:38:58] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:38:59] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
Seeds path /ddt/domain_discovery_tool_react/server/data/political_news/seeds.txt
EXCEPTION IN (/ddt/domain_discovery_API/models/domain_discovery_model.py, LINE 1729 "with open(file_positive, 'w') as f:"): [Errno 36] File name too long: u'/ddt/domain_discovery_tool_react/server/data/political_news/training_data/positive/http%3A%2F%2Fnytimes.com%2F2015%2F03%2F17%2Fnyregion%2Fnight-of-drug-overdoses-jolts-wesleyans-liberal-tradition.html%3Faction%3Dclick%26contentCollection%3DOpinion%26module%3DMostEmailed%26version%3DFull%26region%3DMarginalia%26src%3Dme%26pgtype%3Darticle'
EXCEPTION IN (/ddt/domain_discovery_API/models/domain_discovery_model.py, LINE 1729 "with open(file_positive, 'w') as f:"): [Errno 36] File name too long: u'/ddt/domain_discovery_tool_react/server/data/political_news/training_data/positive/http%3A%2F%2Fnytimes.com%2F2014%2F08%2F16%2Fupshot%2Fmapping-migration-in-the-united-states-since-1900.html%3FWT.mc_id%3D2015-Q1-KWP-AUD_DEV-0101-0331%26WT.mc_ev%3Dclick%26bicmp%3DAD%26bicmlukp%3DWT.mc_id%26bicmst%3D1420088400%26bicmet%3D1451624400%26ad-keywords%3DAUDDEVMAR%26kwp_0%3D10713%26kwp_4%3D78883%26kwp_1%3D125942'
EXCEPTION IN (/ddt/domain_discovery_API/models/domain_discovery_model.py, LINE 1729 "with open(file_positive, 'w') as f:"): [Errno 36] File name too long: u'/ddt/domain_discovery_tool_react/server/data/political_news/training_data/positive/http%3A%2F%2Fnytimes.com%2F2014%2F08%2F16%2Fupshot%2Fmapping-migration-in-the-united-states-since-1900.html%3FWT.mc_id%3D2015-Q1-KWP-AUD_DEV-0101-0331%26WT.mc_ev%3Dclick%26bicmp%3DAD%26bicmlukp%3DWT.mc_id%26bicmst%3D1420088400%26bicmet%3D1451624400%26ad-keywords%3DAUDDEVMAR%26kwp_0%3D10713%26kwp_4%3D78883%26kwp_1%3D125942%26_r%3D0'
EXCEPTION IN (/ddt/domain_discovery_API/models/domain_discovery_model.py, LINE 1729 "with open(file_positive, 'w') as f:"): [Errno 36] File name too long: u'/ddt/domain_discovery_tool_react/server/data/political_news/training_data/positive/http%3A%2F%2Fnytimes.com%2F2014%2F08%2F16%2Fupshot%2Fmapping-migration-in-the-united-states-since-1900.html%3FWT.mc_id%3D2015-Q1-KWP-AUD_DEV-0101-0331%26WT.mc_ev%3Dclick%26bicmp%3DAD%26bicmlukp%3DWT.mc_id%26bicmst%3D1420088400%26bicmet%3D1451624400%26ad-keywords%3DAUDDEVMAR%26kwp_0%3D10713%26kwp_4%3D78883%26kwp_1%3D125942%26_r%3D0%26abt%3D0002%26abg%3D1'
172.17.0.1 - - [13/Jun/2017:18:38:59] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:38:59] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:00] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:00] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
ACHE Crawler 0.9.0-SNAPSHOT
Preparing training data...
POSITIVE:408
NEGATIVE:2
172.17.0.1 - - [13/Jun/2017:18:39:01] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:01] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:02] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:02] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:03] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:03] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:04] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:04] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:05] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:05] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:06] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:06] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:07] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:07] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:08] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:08] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:09] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:09] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:10] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:10] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:11] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:11] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:12] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:12] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:13] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:13] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:14] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:14] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
Training model...
Training SMO model...
172.17.0.1 - - [13/Jun/2017:18:39:15] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:15] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:16] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:16] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:17] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
172.17.0.1 - - [13/Jun/2017:18:39:17] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
Options: -M -C 0.01
SMO
Kernel used:
Linear Kernel: K(x,y) = <x,y>
Classifier for classes: CLASS_0, CLASS_1
BinarySMO
Machine linear: showing attribute weights, not support vectors.
The message shown was incorrect and was fixed. We cannot select the search engine for seed finder. Seed finder process can now be monitored in process monitor.
I tried to use the seed finder with the keywords: political news Then DDT said: Query failed. Try Bing.
But under the SeedFinder, there is no option to switch the search engine.
I went back to the Search Tab and selected Bing. Then I re-started the SeedFinder and DDT crashed. Log attached below.
172.17.0.1 - - [13/Jun/2017:18:37:58] "POST /updateOnlineClassifier HTTP/1.1" 200 25 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:38:06] "POST /getPages HTTP/1.1" 200 1192 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" Using default negative tags 172.17.0.1 - - [13/Jun/2017:18:38:57] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:38:58] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:38:59] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" Seeds path /ddt/domain_discovery_tool_react/server/data/political_news/seeds.txt EXCEPTION IN (/ddt/domain_discovery_API/models/domain_discovery_model.py, LINE 1729 "with open(file_positive, 'w') as f:"): [Errno 36] File name too long: u'/ddt/domain_discovery_tool_react/server/data/political_news/training_data/positive/http%3A%2F%2Fnytimes.com%2F2015%2F03%2F17%2Fnyregion%2Fnight-of-drug-overdoses-jolts-wesleyans-liberal-tradition.html%3Faction%3Dclick%26contentCollection%3DOpinion%26module%3DMostEmailed%26version%3DFull%26region%3DMarginalia%26src%3Dme%26pgtype%3Darticle' EXCEPTION IN (/ddt/domain_discovery_API/models/domain_discovery_model.py, LINE 1729 "with open(file_positive, 'w') as f:"): [Errno 36] File name too long: u'/ddt/domain_discovery_tool_react/server/data/political_news/training_data/positive/http%3A%2F%2Fnytimes.com%2F2014%2F08%2F16%2Fupshot%2Fmapping-migration-in-the-united-states-since-1900.html%3FWT.mc_id%3D2015-Q1-KWP-AUD_DEV-0101-0331%26WT.mc_ev%3Dclick%26bicmp%3DAD%26bicmlukp%3DWT.mc_id%26bicmst%3D1420088400%26bicmet%3D1451624400%26ad-keywords%3DAUDDEVMAR%26kwp_0%3D10713%26kwp_4%3D78883%26kwp_1%3D125942' EXCEPTION IN (/ddt/domain_discovery_API/models/domain_discovery_model.py, LINE 1729 "with open(file_positive, 'w') as f:"): [Errno 36] File name too long: u'/ddt/domain_discovery_tool_react/server/data/political_news/training_data/positive/http%3A%2F%2Fnytimes.com%2F2014%2F08%2F16%2Fupshot%2Fmapping-migration-in-the-united-states-since-1900.html%3FWT.mc_id%3D2015-Q1-KWP-AUD_DEV-0101-0331%26WT.mc_ev%3Dclick%26bicmp%3DAD%26bicmlukp%3DWT.mc_id%26bicmst%3D1420088400%26bicmet%3D1451624400%26ad-keywords%3DAUDDEVMAR%26kwp_0%3D10713%26kwp_4%3D78883%26kwp_1%3D125942%26_r%3D0' EXCEPTION IN (/ddt/domain_discovery_API/models/domain_discovery_model.py, LINE 1729 "with open(file_positive, 'w') as f:"): [Errno 36] File name too long: u'/ddt/domain_discovery_tool_react/server/data/political_news/training_data/positive/http%3A%2F%2Fnytimes.com%2F2014%2F08%2F16%2Fupshot%2Fmapping-migration-in-the-united-states-since-1900.html%3FWT.mc_id%3D2015-Q1-KWP-AUD_DEV-0101-0331%26WT.mc_ev%3Dclick%26bicmp%3DAD%26bicmlukp%3DWT.mc_id%26bicmst%3D1420088400%26bicmet%3D1451624400%26ad-keywords%3DAUDDEVMAR%26kwp_0%3D10713%26kwp_4%3D78883%26kwp_1%3D125942%26_r%3D0%26abt%3D0002%26abg%3D1' 172.17.0.1 - - [13/Jun/2017:18:38:59] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:38:59] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:00] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:00] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
ACHE Crawler 0.9.0-SNAPSHOT
Preparing training data... POSITIVE:408 NEGATIVE:2 172.17.0.1 - - [13/Jun/2017:18:39:01] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:01] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:02] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:02] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:03] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:03] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:04] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:04] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:05] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:05] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:06] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:06] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:07] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:07] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:08] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:08] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:09] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:09] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:10] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:10] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:11] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:11] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:12] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:12] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:13] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:13] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:14] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:14] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" Training model... Training SMO model... 172.17.0.1 - - [13/Jun/2017:18:39:15] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:15] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:16] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:16] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:17] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:17] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
Options: -M -C 0.01
SMO
Kernel used: Linear Kernel: K(x,y) = <x,y>
Classifier for classes: CLASS_0, CLASS_1
BinarySMO
Machine linear: showing attribute weights, not support vectors.
Number of kernel evaluations: 22213 (72.492% cached)
Logistic Regression with ridge parameter of 1.0E-8 Coefficients... Class Variable CLASS_0
pred -131.7197 Intercept -110.8882
Odds Ratios... Class Variable CLASS_0
pred 0
Time taken to build model: 0.47 seconds Time taken to test model on training data: 0.14 seconds
=== Error on training data ===
Correctly Classified Instances 410 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 410
=== Confusion Matrix ===
a b <-- classified as 408 0 | a = CLASS_0 0 2 | b = CLASS_1
=== Stratified cross-validation ===
Correctly Classified Instances 406 99.0244 % Incorrectly Classified Instances 4 0.9756 % Kappa statistic -0.0049 Mean absolute error 0.0104 Root mean squared error 0.0982 Relative absolute error 84.1457 % Root relative squared error 140.5804 % Total Number of Instances 410
=== Confusion Matrix ===
a b <-- classified as 406 2 | a = CLASS_0 2 0 | b = CLASS_1
Creating feature file... done. None
RUN SEED FINDER politics news
EXEC SEED FINDER172.17.0.1 - - [13/Jun/2017:18:39:17] "POST /runSeedFinder HTTP/1.1" 200 9 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" politics news /ddt/domain_discovery_tool_react/server
COLLECT SEED URLS politics news /ddt/domain_discovery_tool_react/server/data/political_news/seedFinder/politics_news_results.csv 172.17.0.1 - - [13/Jun/2017:18:39:18] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:18] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:19] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:19] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:20] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:20] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:21] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:21] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:22] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:22] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:23] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:23] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:24] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:24] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:25] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:25] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:26] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 172.17.0.1 - - [13/Jun/2017:18:39:26] "POST /getPages HTTP/1.1" 200 37 "http://0.0.0.0:8084/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" Jun 13, 2017 6:39:59 PM org.elasticsearch.plugins.PluginsService
INFO: [Black Goliath] loaded [], sites []
Exception in thread "pool-1-thread-9" java.lang.IllegalArgumentException: Illegal character in query at index 49: https://twitter.com/foxnewspolitics?ref_src=twsrc^google|twcamp^serp|twgr^author
at java.net.URI.create(URI.java:852)
at org.apache.http.client.methods.HttpGet.(HttpGet.java:69)
at Download_URL.run(Download_URL.java:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.URISyntaxException: Illegal character in query at index 49: https://twitter.com/foxnewspolitics?ref_src=twsrc^google|twcamp^serp|twgr^author
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parseHierarchical(URI.java:3111)
at java.net.URI$Parser.parse(URI.java:3053)
at java.net.URI.(URI.java:588)
at java.net.URI.create(URI.java:850)
... 5 more
org.apache.http.client.ClientProtocolException: Unexpected response status: 403
at Download_URL.run(Download_URL.java:285)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Time Elapsed time for http://www.huffingtonpost.com/section/politics thread = 0.739 secs
Jun 13, 2017 6:40:01 PM org.apache.http.client.protocol.ResponseProcessCookies processCookies WARNING: Invalid cookie header: "Set-Cookie: ABtestingV2=B; expires=Sun, 10 Dec 2017 18:40:01 GMT; path=/;". Invalid 'expires' attribute: Sun, 10 Dec 2017 18:40:01 GMT Jun 13, 2017 6:40:01 PM org.apache.http.client.protocol.ResponseProcessCookies processCookies WARNING: Invalid cookie header: "Set-Cookie: visid_incap_121505=ZPUklx9OR7uiiwijKZemilgxQFkAAAAAQUIPAAAAAADUXHfViB0ZiO5vj2/e69Lk; expires=Wed, 13 Jun 2018 08:02:50 GMT; path=/; Domain=.economist.com". Invalid 'expires' attribute: Wed, 13 Jun 2018 08:02:50 GMT Jun 13, 2017 6:40:01 PM org.apache.http.client.protocol.ResponseProcessCookies processCookies WARNING: Invalid cookie header: "Set-Cookie: ABtestingV2=A; expires=Sun, 10 Dec 2017 18:40:01 GMT; path=/;". Invalid 'expires' attribute: Sun, 10 Dec 2017 18:40:01 GMT /ddt/run_ddt: line 33: 101 Killed python $DDT_HOME/server/server.py Stopping elastisearch container elastic Removing elastisearch container elastic Stopping DD Tool container dd_tool Removing DD Tool container dd_tool Julianas-MacBook-Pro-2:Downloads juliana$