ACAHNN / adscape

Crawler code and ad detection algorithm
MIT License
3 stars 3 forks source link

[Errno 61] Connection refused #2

Open dhowe opened 8 years ago

dhowe commented 8 years ago
$ python firefly.py adscape 6000 www.yahoo.com

['firefly.py', 'adscape', '6000', 'www.yahoo.com']
Firefly timeout, 3 retries left
[Errno 61] Connection refused
Firefly timeout, 2 retries left
[Errno 61] Connection refused
Firefly timeout, 1 retries left
[Errno 61] Connection refused
^CTraceback (most recent call last):
  File "firefly.py", line 62, in <module>
    print Firefly(port).get_visual_elements(website)
  File "firefly.py", line 42, in get_visual_elements
    return self._send_command('GOTO %s' % url)
  File "firefly.py", line 39, in _send_command
    return self._send_command(command, retries)
  File "firefly.py", line 39, in _send_command
    return self._send_command(command, retries)
  File "firefly.py", line 38, in _send_command
    time.sleep([30, 10, 2][retries])
KeyboardInterrupt
dhowe commented 8 years ago

This occurs after enabling remote debugging and changing the firefox call to:

p = subprocess.Popen([ffbin, '-no-remote', '-P', profile, '-start-debugger-server'],

which starts the debugger service on port 6000

Then I get the following prompt 3 times before seeing the error:

screen shot 2016-07-31 at 5 05 38 pm
dhowe commented 8 years ago

I've now tried FF 3.6 as you suggested, but with this version, I don't even get the prompts or error messages. The program just exits after a few seconds with no output beyond the pid.

$ python firefox_startup.py
12858
ACAHNN commented 8 years ago

dhowe,

Have you run the profile_setup.py script? I get that type of error output (first message) when the extension isn't installed and therefore no server is listening on a port. Run something like:

python profile_setup.py

Example: python profile_setup.py debug 10024

Let me know the output of that (it should be nothing). Then run:

python cube.py debug 10024

to test and let me know what the output is. If it's the same, check to make sure the extension is installed to firefox and enabled. That may be the issue, you may need to manually add the extension (luckily that's easy). We'll go from there.

adrbv commented 6 years ago

I know it's been a while. I am trying to understand the crawler after reading the paper, which is really interesting.

I have the same problem as @dhowe. I have followed your instructions and the error is the same. I think the extension is added. When the crawler open Firefox, I check the Addons section in the browser and Firefly is there and enabled. Unfortunately, firefly can't be reached. What else should I do?

I want to check and understand how is the data that the crawler collects from the source code. Do you have any samples of your crawls that you could share? I specifically need to understand how do you get the landing page? According to the paper, you are using a HTTP library. Which one is that? What part of the element or iframe are you using to fetch the landing page?

Please help

Thanks a lot.