djm / python-scrapyd-api

A Python wrapper for working with Scrapyd's API.
BSD 2-Clause "Simplified" License
268 stars 32 forks source link

API and Twisted #16

Closed Jeffwahl closed 4 years ago

Jeffwahl commented 5 years ago

Twisted came out with a new version 19.x the scrapyd-api throws and error looking for a str but receiving and int Prevents using any api function to get status on running spiders See error code below

Solution: Downgrade to Twisted version 18.9.0

<class 'scrapyd_api.exceptions.ScrapydResponseError'> ('Scrapyd returned an invalid JSON response: Traceback (most recent call last):\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/webservice.py", line 21, in render\n return JsonResource.render(self, txrequest).encode(\'utf-8\')\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/utils.py", line 21, in render\n return self.render_object(r, txrequest)\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/utils.py", line 29, in render_object\n txrequest.setHeader(\'Content-Length\', len(r))\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http.py", line 1271, in setHeader\n self.responseHeaders.setRawHeaders(name, [value])\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 220, in setRawHeaders\n for v in self._encodeValues(values)]\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 220, in <listcomp>\n for v in self._encodeValues(values)]\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 40, in _sanitizeLinearWhitespace\n return b\' \'.join(headerComponent.splitlines())\nAttributeError: \'int\' object has no attribute \'splitlines\'\n',) Scrapyd returned an invalid JSON response: Traceback (most recent call last): File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/webservice.py", line 21, in render return JsonResource.render(self, txrequest).encode('utf-8') File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/utils.py", line 21, in render return self.render_object(r, txrequest) File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/utils.py", line 29, in render_object txrequest.setHeader('Content-Length', len(r)) File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http.py", line 1271, in setHeader self.responseHeaders.setRawHeaders(name, [value]) File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 220, in setRawHeaders for v in self._encodeValues(values)] File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 220, in <listcomp> for v in self._encodeValues(values)] File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 40, in _sanitizeLinearWhitespace return b' '.join(headerComponent.splitlines()) AttributeError: 'int' object has no attribute 'splitlines'

Digenis commented 5 years ago

Hi @Jeffwahl,

This was a scrapyd bug, now fixed in scrapyd-1.2.1 which was released just 2 days ago. Update your scrapyd.

This is not a python-scrapyd-api bug. You can close the issue.

azmirfakkri commented 5 years ago

Hi @Digenis,

I am using Python 3.6. I am getting this error when I tried to deploy scrapyd in docker container. http://localhost:6800/ is running but http://localhost:6800/schedule.json returns this error:

Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/scrapyd/webservice.py", line 21, in render return JsonResource.render(self, txrequest).encode('utf-8') File "/usr/local/lib/python3.6/site-packages/scrapyd/utils.py", line 20, in render r = resource.Resource.render(self, txrequest) File "/usr/local/lib/python3.6/site-packages/twisted/web/resource.py", line 264, in render raise UnsupportedMethod(allowedMethods) twisted.web.error.UnsupportedMethod: Expected one of [b'HEAD', b'object', b'POST']

Full traceback:

Traceback (most recent call last):
  File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/scrapyd_api/client.py", line 26, in _handle_response
    json = response.json()
  File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "daily_scrape.py", line 154, in <module>
    scrape()
  File "daily_scrape.py", line 134, in scrape
    job_id = scrapyd.schedule(project=PROJECT, spider=SPIDER, search_text=make, location=random_location)
  File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/scrapyd_api/wrapper.py", line 188, in schedule
    json = self.client.post(url, data=data, timeout=self.timeout)
  File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/requests/sessions.py", line 581, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/scrapyd_api/client.py", line 38, in request
    return self._handle_response(response)
  File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/scrapyd_api/client.py", line 29, in _handle_response
    "response: {0}".format(response.text))
scrapyd_api.exceptions.ScrapydResponseError: Scrapyd returned an invalid JSON response: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/scrapyd/webservice.py", line 21, in render
    return JsonResource.render(self, txrequest).encode('utf-8')
  File "/usr/local/lib/python3.6/site-packages/scrapyd/utils.py", line 20, in render
    r = resource.Resource.render(self, txrequest)
  File "/usr/local/lib/python3.6/site-packages/twisted/web/resource.py", line 265, in render
    return m(request)
  File "/usr/local/lib/python3.6/site-packages/scrapyd/webservice.py", line 49, in render_POST
    spiders = get_spider_list(project, version=version)
  File "/usr/local/lib/python3.6/site-packages/scrapyd/utils.py", line 137, in get_spider_list
    raise RuntimeError(msg.encode('unicode_escape') if six.PY2 else msg)
RuntimeError: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/site-packages/scrapyd/runner.py", line 40, in <module>
    main()
  File "/usr/local/lib/python3.6/site-packages/scrapyd/runner.py", line 37, in main
    execute()
  File "/usr/local/lib/python3.6/site-packages/scrapy/cmdline.py", line 114, in execute
    settings = get_project_settings()
  File "/usr/local/lib/python3.6/site-packages/scrapy/utils/project.py", line 68, in get_project_settings
    settings.setmodule(settings_module_path, priority='project')
  File "/usr/local/lib/python3.6/site-packages/scrapy/settings/__init__.py", line 294, in setmodule
    module = import_module(module)
  File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'scraper.settings'

Scrapyd log:

reclusa_1       | 2019-10-17T13:19:41+0000 [-] Loading /usr/local/lib/python3.6/site-packages/scrapyd/txapp.py...
reclusa_1       | 2019-10-17T13:19:41+0000 [-] Scrapyd web console available at http://0.0.0.0:6800/
reclusa_1       | 2019-10-17T13:19:41+0000 [-] Loaded.
reclusa_1       | 2019-10-17T13:19:41+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 19.7.0 (/usr/local/bin/python 3.6.9) starting up.
reclusa_1       | 2019-10-17T13:19:41+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.
reclusa_1       | 2019-10-17T13:19:41+0000 [-] Site starting on 6800
reclusa_1       | 2019-10-17T13:19:41+0000 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site object at 0x7f2d71557048>
reclusa_1       | 2019-10-17T13:19:41+0000 [Launcher] Scrapyd 1.2.1 started: max_proc=16, runner='scrapyd.runner'
reclusa_1       | 2019-10-17T13:20:53+0000 [twisted.python.log#info] "192.168.48.1" - - [17/Oct/2019:13:20:52 +0000] "POST /schedule.json HTTP/1.1" 200 2049 "-" "python-requests/2.22.0"
reclusa_1       | 2019-10-17T13:30:29+0000 [twisted.python.log#info] "192.168.48.1" - - [17/Oct/2019:13:30:29 +0000] "GET / HTTP/1.1" 200 743 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"
reclusa_1       | 2019-10-17T13:30:30+0000 [twisted.python.log#info] "192.168.48.1" - - [17/Oct/2019:13:30:30 +0000] "GET /favicon.ico HTTP/1.1" 404 153 "http://localhost:6800/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"
reclusa_1       | 2019-10-17T13:30:33+0000 [twisted.python.log#info] "192.168.48.1" - - [17/Oct/2019:13:30:33 +0000] "GET /jobs HTTP/1.1" 200 471 "http://localhost:6800/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"
reclusa_1       | 2019-10-17T13:30:38+0000 [twisted.python.log#info] "192.168.48.1" - - [17/Oct/2019:13:30:38 +0000] "GET /schedule.json HTTP/1.1" 200 544 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"
djm commented 4 years ago

Closing this as it wasn't an issue with this library. Thanks!