Closed Jeffwahl closed 4 years ago
Hi @Jeffwahl,
This was a scrapyd bug, now fixed in scrapyd-1.2.1 which was released just 2 days ago. Update your scrapyd.
This is not a python-scrapyd-api bug. You can close the issue.
Hi @Digenis,
I am using Python 3.6. I am getting this error when I tried to deploy scrapyd in docker container. http://localhost:6800/ is running but http://localhost:6800/schedule.json returns this error:
Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/scrapyd/webservice.py", line 21, in render return JsonResource.render(self, txrequest).encode('utf-8') File "/usr/local/lib/python3.6/site-packages/scrapyd/utils.py", line 20, in render r = resource.Resource.render(self, txrequest) File "/usr/local/lib/python3.6/site-packages/twisted/web/resource.py", line 264, in render raise UnsupportedMethod(allowedMethods) twisted.web.error.UnsupportedMethod: Expected one of [b'HEAD', b'object', b'POST']
Full traceback:
Traceback (most recent call last):
File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/scrapyd_api/client.py", line 26, in _handle_response
json = response.json()
File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "daily_scrape.py", line 154, in <module>
scrape()
File "daily_scrape.py", line 134, in scrape
job_id = scrapyd.schedule(project=PROJECT, spider=SPIDER, search_text=make, location=random_location)
File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/scrapyd_api/wrapper.py", line 188, in schedule
json = self.client.post(url, data=data, timeout=self.timeout)
File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/requests/sessions.py", line 581, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/scrapyd_api/client.py", line 38, in request
return self._handle_response(response)
File "/Users/af/PycharmProjects/data-reclusa-scraper-env/lib/python3.6/site-packages/scrapyd_api/client.py", line 29, in _handle_response
"response: {0}".format(response.text))
scrapyd_api.exceptions.ScrapydResponseError: Scrapyd returned an invalid JSON response: Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/scrapyd/webservice.py", line 21, in render
return JsonResource.render(self, txrequest).encode('utf-8')
File "/usr/local/lib/python3.6/site-packages/scrapyd/utils.py", line 20, in render
r = resource.Resource.render(self, txrequest)
File "/usr/local/lib/python3.6/site-packages/twisted/web/resource.py", line 265, in render
return m(request)
File "/usr/local/lib/python3.6/site-packages/scrapyd/webservice.py", line 49, in render_POST
spiders = get_spider_list(project, version=version)
File "/usr/local/lib/python3.6/site-packages/scrapyd/utils.py", line 137, in get_spider_list
raise RuntimeError(msg.encode('unicode_escape') if six.PY2 else msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/site-packages/scrapyd/runner.py", line 40, in <module>
main()
File "/usr/local/lib/python3.6/site-packages/scrapyd/runner.py", line 37, in main
execute()
File "/usr/local/lib/python3.6/site-packages/scrapy/cmdline.py", line 114, in execute
settings = get_project_settings()
File "/usr/local/lib/python3.6/site-packages/scrapy/utils/project.py", line 68, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "/usr/local/lib/python3.6/site-packages/scrapy/settings/__init__.py", line 294, in setmodule
module = import_module(module)
File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'scraper.settings'
Scrapyd log:
reclusa_1 | 2019-10-17T13:19:41+0000 [-] Loading /usr/local/lib/python3.6/site-packages/scrapyd/txapp.py...
reclusa_1 | 2019-10-17T13:19:41+0000 [-] Scrapyd web console available at http://0.0.0.0:6800/
reclusa_1 | 2019-10-17T13:19:41+0000 [-] Loaded.
reclusa_1 | 2019-10-17T13:19:41+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 19.7.0 (/usr/local/bin/python 3.6.9) starting up.
reclusa_1 | 2019-10-17T13:19:41+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.
reclusa_1 | 2019-10-17T13:19:41+0000 [-] Site starting on 6800
reclusa_1 | 2019-10-17T13:19:41+0000 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site object at 0x7f2d71557048>
reclusa_1 | 2019-10-17T13:19:41+0000 [Launcher] Scrapyd 1.2.1 started: max_proc=16, runner='scrapyd.runner'
reclusa_1 | 2019-10-17T13:20:53+0000 [twisted.python.log#info] "192.168.48.1" - - [17/Oct/2019:13:20:52 +0000] "POST /schedule.json HTTP/1.1" 200 2049 "-" "python-requests/2.22.0"
reclusa_1 | 2019-10-17T13:30:29+0000 [twisted.python.log#info] "192.168.48.1" - - [17/Oct/2019:13:30:29 +0000] "GET / HTTP/1.1" 200 743 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"
reclusa_1 | 2019-10-17T13:30:30+0000 [twisted.python.log#info] "192.168.48.1" - - [17/Oct/2019:13:30:30 +0000] "GET /favicon.ico HTTP/1.1" 404 153 "http://localhost:6800/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"
reclusa_1 | 2019-10-17T13:30:33+0000 [twisted.python.log#info] "192.168.48.1" - - [17/Oct/2019:13:30:33 +0000] "GET /jobs HTTP/1.1" 200 471 "http://localhost:6800/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"
reclusa_1 | 2019-10-17T13:30:38+0000 [twisted.python.log#info] "192.168.48.1" - - [17/Oct/2019:13:30:38 +0000] "GET /schedule.json HTTP/1.1" 200 544 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"
Closing this as it wasn't an issue with this library. Thanks!
Twisted came out with a new version 19.x the scrapyd-api throws and error looking for a str but receiving and int Prevents using any api function to get status on running spiders See error code below
Solution: Downgrade to Twisted version 18.9.0
<class 'scrapyd_api.exceptions.ScrapydResponseError'> ('Scrapyd returned an invalid JSON response: Traceback (most recent call last):\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/webservice.py", line 21, in render\n return JsonResource.render(self, txrequest).encode(\'utf-8\')\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/utils.py", line 21, in render\n return self.render_object(r, txrequest)\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/utils.py", line 29, in render_object\n txrequest.setHeader(\'Content-Length\', len(r))\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http.py", line 1271, in setHeader\n self.responseHeaders.setRawHeaders(name, [value])\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 220, in setRawHeaders\n for v in self._encodeValues(values)]\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 220, in <listcomp>\n for v in self._encodeValues(values)]\n File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 40, in _sanitizeLinearWhitespace\n return b\' \'.join(headerComponent.splitlines())\nAttributeError: \'int\' object has no attribute \'splitlines\'\n',) Scrapyd returned an invalid JSON response: Traceback (most recent call last): File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/webservice.py", line 21, in render return JsonResource.render(self, txrequest).encode('utf-8') File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/utils.py", line 21, in render return self.render_object(r, txrequest) File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/scrapyd/utils.py", line 29, in render_object txrequest.setHeader('Content-Length', len(r)) File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http.py", line 1271, in setHeader self.responseHeaders.setRawHeaders(name, [value]) File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 220, in setRawHeaders for v in self._encodeValues(values)] File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 220, in <listcomp> for v in self._encodeValues(values)] File "/data/workspaces/.virtualenvs/new-platform/lib/python3.6/site-packages/twisted/web/http_headers.py", line 40, in _sanitizeLinearWhitespace return b' '.join(headerComponent.splitlines()) AttributeError: 'int' object has no attribute 'splitlines'