essandess / isp-data-pollution

ISP Data Pollution to Protect Private Browsing History with Obfuscation
MIT License
590 stars 53 forks source link

Error on ubuntu 16.04 #6

Closed aeametal closed 7 years ago

aeametal commented 7 years ago

I get the following output after installing and running the script: Seeding with search for 'catfish'... Expecting value: line 1 column 1 (char 0) Traceback (most recent call last): File "isp_data_pollution.py", line 483, in ISPDataPollution(debug=True) File "isp_data_pollution.py", line 126, in init self.pollute_forever() File "isp_data_pollution.py", line 211, in pollute_forever self.seed_links() File "isp_data_pollution.py", line 246, in seed_links self.get_websearch(word) File "isp_data_pollution.py", line 367, in get_websearch if len(self.links) < self.max_links_cached: self.add_url_links(new_links) File "isp_data_pollution.py", line 423, in add_url_links if self.debug: print('Added {:d} links, {:d} total at url \'{}\'.'.format(k,len(self.links),self.session.current_url)) File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/webdriver.py", line 454, in current_url return self.execute(Command.GET_CURRENT_URL)['value'] File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/webdriver.py", line 201, in execute self.error_handler.check_response(response) File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/errorhandler.py", line 102, in check_response value = json.loads(value_json) File "/usr/lib/python3.5/json/init.py", line 319, in loads return _default_decoder.decode(s) File "/usr/lib/python3.5/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

essandess commented 7 years ago

It appears that you haven't installed phantomjs, or that it's not in your PATH.

Here's the SSCCE for phantomjs. If this doesn't run, the script won't run.

which phantomjs
python3 -c 'from selenium import webdriver; driver = webdriver.PhantomJS(); driver.get("https://github.com"); print(driver.title); driver.quit()'
aeametal commented 7 years ago

Thanks for your quick response.

  1. Phantomjs is installed on my system. Issuing "which phantomjs" returns /usr/bin/phantomjs
  2. Running the script returns "The world's leading software development platform · GitHub"
  3. I get the following output when I run $ python3 isp_data_pollution.py Downloading the blacklist... done. Seeding with search for 'superstition'... Expecting value: line 1 column 1 (char 0) Traceback (most recent call last): File "isp_data_pollution.py", line 483, in ISPDataPollution(debug=True) File "isp_data_pollution.py", line 126, in init self.pollute_forever() File "isp_data_pollution.py", line 211, in pollute_forever self.seed_links() File "isp_data_pollution.py", line 246, in seed_links self.get_websearch(word) File "isp_data_pollution.py", line 367, in get_websearch if len(self.links) < self.max_links_cached: self.add_url_links(new_links) File "isp_data_pollution.py", line 423, in add_url_links if self.debug: print('Added {:d} links, {:d} total at url \'{}\'.'.format(k,len(self.links),self.session.current_url)) File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/webdriver.py", line 454, in current_url return self.execute(Command.GET_CURRENT_URL)['value'] File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/webdriver.py", line 201, in execute self.error_handler.check_response(response) File "/usr/lib/python3/dist-packages/selenium/webdriver/remote/errorhandler.py", line 102, in check_response value = json.loads(value_json) File "/usr/lib/python3.5/json/init.py", line 319, in loads return _default_decoder.decode(s) File "/usr/lib/python3.5/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
essandess commented 7 years ago

Hmmm. Looks like the current_url method is puking on your box. What happens when you do this?

python3 -c 'from selenium import webdriver; driver = webdriver.PhantomJS(); driver.get("https://github.com"); print(driver.title); print(driver. current_url); driver.quit()'

This throws an error for me. I'll wrap this part in a tryexcept.

aeametal commented 7 years ago

The script is running after the last commit... but I get the following: Seeding with search for 'terrain grippe'... Expecting value: line 1 column 1 (char 0) Expecting value: line 1 column 1 (char 0) Added 0 links, 22 total at url 'http://www.google.com/search?q=terrain grippe'.

A suggestion: Let users add a list.csv containing URLs of their choosing to pollute the history with non-random searches. Clustering techniques will have trouble isolating clean data if part of the non-random list is different among users.

essandess commented 7 years ago

There is a list of non-random searches at the start of the script. I'll think about looking for a specific file to add. It would just take a moment to fork the code and add this to non-random list.