DannyBen / snapcrawl

Crawl a website and take screenshots
MIT License
57 stars 12 forks source link

Unable to connect to local address on windows docker #31

Closed chiongsterx closed 4 years ago

chiongsterx commented 4 years ago

Hi,

I've have recently downloaded windows docker and pulled your latest version of snapcrawl into the docker. I assume that there is no need to download any other things as the prerequisites mentioned should be within the docker.

latest: Pulling from dannyben/snapcrawl Digest: sha256:6423274929cadbb63ba4adc158d30108d354ed4e58293d6ca96797628875e798 Status: Image is up to date for dannyben/snapcrawl:latest docker.io/dannyben/snapcrawl:latest

I tried using the command below to try to crawl my internal router but it doesn't seem to be successful. Command: docker run --rm -it dannyben/snapcrawl 192.168.1.254 depth=2

Error recevied: INFO : processing http://192.168.1.254, depth: 0 INFO : capturing screenshot for / ERROR : screenshot error on / - Snapcrawl::ScreenshotError: Webshot::WebshotError Capybara error: "Request to 'http://192.168.1.254' failed to reach server, check DNS and/or server status - Timed out with the following resources still waiting http://192.168.1.254/js/menu.js"

And at the same time, I am unable to locate the location of where the snaps are going to.

Can you help me? Thank you

DannyBen commented 4 years ago

Hi,

When using the docker image, you need to mount the current working directory to the container's /app folder. This will make it so the container can read your config file in the current directory (if present), and to write the snaps and cache back to your host.

In addition, if you want it to be able to access your local network, you need to use the --network host option when running docker.

I recommend you use this command:

$ alias snapcrawl='docker run --rm -it --network host --volume $PWD:/app dannyben/snapcrawl'

Then you can just user:

$ madness 192.168.1.254 depth=2

There is already a mention of some of what I said in the readme, but I will add the --network host bit.

Let me know if it works.

chiongsterx commented 4 years ago

Hi,

Sorry as i am new to this. How do i know if the alias has been mounted. I ran alias and received this instead.

alias : This command cannot find a matching alias because an alias with the name 'snapcrawl=docker run --rm -it --network host --volume $PWD:/app dannyben/snapcrawl' does not exist. At line:1 char:1

DannyBen commented 4 years ago

Oh, you are on Windows - not sure if you can alias there. You might be able to create a batch file or something similar to a linux alias.

But, the docker command you need is this:

$ docker run --rm -it --network host --volume $PWD:/app dannyben/snapcrawl

Just run it and append any parameter to it:

docker run --rm -it --network host --volume $PWD:/app dannyben/snapcrawl 192.168.1.254 depth=2

Not sure if the environment variable $PWD is present in your shell, if it isn't, just replace it with the current directory.

chiongsterx commented 4 years ago

This is what i am currently receiving. I've tried to use the windows equivalent of UNIX alias - (Doskey) Changed the $PWD to windows (cd). However, it does not search the depth as follows. In addition, I am unable to locate the screenshot image

DannyBen commented 4 years ago

Well, there are too many specific factors in your case, which so far don't look snapcrawl related, but rather related to docker on windows.

I suggest that you read a little about how to use docker volumes, this will help you I am sure.

Now, from what I see:

  1. The --volume cd:/app you have used is invalid. The first part before the : needs to be an absolute local path.
  2. When running the following command, depth is properly working. If your specific site does not seem to work with depth, it might be that the way it defines links is different. Snapcrawl looks for <a href='...'> tags.
$ docker run --rm -it --network host --volume $PWD:/app dannyben/snapcrawl https://duckduckgo.com depth=2