Closed andrewexton373 closed 3 years ago
It should not get confused by the hash part of the URL. I need to check it further.
Well - found the issue.
Since your command had no protocol, the assumed site is http://ransomit.com
, and then links to https://ransomit.com
are considered external and are therefore not crawled.
Just run it with snapcrawl https://ransomit.com depth=2 width=1024
and you should be fine.
I will see if I can improve that unexpected behavior.
Ahhh, that makes sense. Appreciate the help!
I'm trying to capture images from the site http://ransomit.com
snapcrawl seems to get confused about the structure of the website. I believe the issue is caused by href='#' included in the HTML of the example website provided.
Are there any steps I can take to successfully crawl and screenshot all levels of this site? Maybe utilizing the url_blacklist feature? Also, is this a bug, or an expected result? I'm not completely sure.