apify / actor-templates

This project is the :house: home of Apify actor template projects to help users quickly get started.
https://apify.com/
25 stars 14 forks source link

Python Scrapy template does not work #178

Closed vdusek closed 1 year ago

vdusek commented 1 year ago

A full log from the execution of the Actor created by the Scrapy template:

$ apify run --purge
Info: All default local stores were purged.
Run: /home/vdusek/Apify/my-scrapy-actor-2/.venv/bin/python3 -m src
INFO  Initializing actor...
INFO  System info ({"apify_sdk_version": "1.1.3", "apify_client_version": "1.4.0", "python_version": "3.11.4", "os": "linux"})
DEBUG APIFY_ACTOR_EVENTS_WS_URL env var not set, no events from Apify platform will be emitted.
INFO  Scrapy 2.9.0 started (bot: titlebot)
INFO  Versions: lxml 4.9.3.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 23.8.0, Python 3.11.4 (main, Jun  7 2023, 00:00:00) [GCC 13.1.1 20230511 (Red Hat 13.1.1-2)], pyOpenSSL 23.2.0 (OpenSSL 3.1.2 1 Aug 2023), cryptography 41.0.3, Platform Linux-6.4.11-200.fc38.x86_64-x86_64-with-glibc2.37
INFO  Overridden settings:
      {'BOT_NAME': 'titlebot',
       'DEPTH_LIMIT': 1,
       'NEWSPIDER_MODULE': 'src.spiders',
       'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
       'ROBOTSTXT_OBEY': True,
       'SPIDER_MODULES': ['src.spiders'],
       'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
DEBUG Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
DEBUG Using asyncio event loop: asyncio.unix_events._UnixSelectorEventLoop
INFO  Telnet Password: 61c93874b7a93a39
INFO  Enabled extensions:
      ['scrapy.extensions.corestats.CoreStats',
       'scrapy.extensions.telnet.TelnetConsole',
       'scrapy.extensions.memusage.MemoryUsage',
       'scrapy.extensions.logstats.LogStats'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f92ccf75b10>"})
INFO  Enabled downloader middlewares:
      ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
       'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
       'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
       'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
       'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
       'scrapy.downloadermiddlewares.retry.RetryMiddleware',
       'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
       'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
       'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
       'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
       'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
       'scrapy.downloadermiddlewares.stats.DownloaderStats'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f92ccf75b10>"})
INFO  Enabled spider middlewares:
      ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
       'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
       'scrapy.spidermiddlewares.referer.RefererMiddleware',
       'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
       'scrapy.spidermiddlewares.depth.DepthMiddleware'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f92ccf75b10>"})
INFO  Enabled item pipelines:
      [<class 'src.pipelines.ActorDatasetPushPipeline'>] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f92ccf75b10>"})
INFO  Spider opened ({"spider": "<TitleSpider 'title_spider' at 0x7f92cc478f10>"})
INFO  Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) ({"spider": "<TitleSpider 'title_spider' at 0x7f92cc478f10>"})
INFO  Telnet console listening on 127.0.0.1:6023 ({"crawler": "<scrapy.crawler.Crawler object at 0x7f92ccf75b10>"})
ERROR Actor failed with an exception
      Traceback (most recent call last):
        File "/home/vdusek/Apify/my-scrapy-actor-2/src/main.py", line 25, in main
          process.start()
        File "/home/vdusek/Apify/my-scrapy-actor-2/.venv/lib64/python3.11/site-packages/scrapy/crawler.py", line 383, in start
          install_shutdown_handlers(self._signal_shutdown)
        File "/home/vdusek/Apify/my-scrapy-actor-2/.venv/lib64/python3.11/site-packages/scrapy/utils/ossignal.py", line 19, in install_shutdown_handlers
          reactor._handleSignals()
          ^^^^^^^^^^^^^^^^^^^^^^
      AttributeError: 'AsyncioSelectorReactor' object has no attribute '_handleSignals'
INFO  Exiting actor ({"exit_code": 91})
DEBUG Not calling sys.exit(91) because actor is running in a nested event loop
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/vdusek/Apify/my-scrapy-actor-2/src/__main__.py", line 58, in <module>
    asyncio.run(main())
  File "/home/vdusek/Apify/my-scrapy-actor-2/.venv/lib64/python3.11/site-packages/nest_asyncio.py", line 31, in run
    return loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vdusek/Apify/my-scrapy-actor-2/.venv/lib64/python3.11/site-packages/nest_asyncio.py", line 99, in run_until_complete
    return f.result()
           ^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib64/python3.11/asyncio/tasks.py", line 267, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/home/vdusek/Apify/my-scrapy-actor-2/src/main.py", line 25, in main
    process.start()
  File "/home/vdusek/Apify/my-scrapy-actor-2/.venv/lib64/python3.11/site-packages/scrapy/crawler.py", line 383, in start
    install_shutdown_handlers(self._signal_shutdown)
  File "/home/vdusek/Apify/my-scrapy-actor-2/.venv/lib64/python3.11/site-packages/scrapy/utils/ossignal.py", line 19, in install_shutdown_handlers
    reactor._handleSignals()
    ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'AsyncioSelectorReactor' object has no attribute '_handleSignals'
Error: /home/vdusek/Apify/my-scrapy-actor-2/.venv/bin/python3 exited with code 1