Nekmo / dirhunt

Find web directories without bruteforce
MIT License
1.76k stars 253 forks source link

Ignore invalid protocols #3

Closed Nekmo closed 6 years ago

Nekmo commented 6 years ago

Original asset:

<link rel="stylesheet" href="css:custom.css">
Traceback (most recent call last):
  File "/home/nekmo/.virtualenvs/dirhunt/lib/python3.6/site-packages/requests/models.py", line 371, in prepare_url
    scheme, auth, host, port, path, query, fragment = parse_url(url)
  File "/home/nekmo/.virtualenvs/dirhunt/lib/python3.6/site-packages/urllib3/util/url.py", line 199, in parse_url
    raise LocationParseError(url)
urllib3.exceptions.LocationParseError: Failed to parse: website.comcss:custom.css

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nekmo/Workspace/dirhunt/dirhunt/crawler.py", line 67, in start
    resp = session.get(self.url.url, stream=True, timeout=TIMEOUT, allow_redirects=False)
  File "/home/nekmo/Workspace/dirhunt/dirhunt/crawler.py", line 119, in get
    response = self.session.get(url, **kwargs)
  File "/home/nekmo/.virtualenvs/dirhunt/lib/python3.6/site-packages/requests/sessions.py", line 521, in get
    return self.request('GET', url, **kwargs)
  File "/home/nekmo/.virtualenvs/dirhunt/lib/python3.6/site-packages/requests/sessions.py", line 494, in request
    prep = self.prepare_request(req)
  File "/home/nekmo/.virtualenvs/dirhunt/lib/python3.6/site-packages/requests/sessions.py", line 437, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/home/nekmo/.virtualenvs/dirhunt/lib/python3.6/site-packages/requests/models.py", line 305, in prepare
    self.prepare_url(url, params)
  File "/home/nekmo/.virtualenvs/dirhunt/lib/python3.6/site-packages/requests/models.py", line 373, in prepare_url
    raise InvalidURL(*e.args)
requests.exceptions.InvalidURL: Failed to parse: domain.comcss:custom.css

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nekmo/Workspace/dirhunt/dirhunt/crawler.py", line 29, in wrapped
    return func(*args, **kwargs)
  File "/home/nekmo/Workspace/dirhunt/dirhunt/crawler.py", line 70, in start
    self.close()
  File "/home/nekmo/Workspace/dirhunt/dirhunt/crawler.py", line 110, in close
    del self.crawler.processing[self.url.url]