apify / actor-templates

This project is the :house: home of Apify actor template projects to help users quickly get started.
https://apify.com/
24 stars 14 forks source link

Scrapy Actor: Some log lines are logged twice #256

Closed vdusek closed 8 months ago

vdusek commented 8 months ago

Description

Example of such log

2023-12-15T15:10:18.262Z ACTOR: Pulling Docker image of build r9WYYLPAanPEMEmN2 from repository.
2023-12-15T15:10:18.498Z ACTOR: Creating Docker container.
2023-12-15T15:10:18.601Z ACTOR: Starting Docker container.
2023-12-15T15:10:20.553Z [apify] INFO  Initializing actor...
2023-12-15T15:10:20.556Z [apify] INFO  System info ({"apify_sdk_version": "1.4.0", "apify_client_version": "1.6.0", "python_version": "3.11.6", "os": "linux"})
2023-12-15T15:10:20.558Z [apify] INFO  Actor is being executed...
2023-12-15T15:10:21.125Z [apify] INFO  proxy_url: ParseResult(scheme='http', netloc='auto:*********@10.0.82.23:8011', path='', params='', query='', fragment='')
2023-12-15T15:10:21.128Z [apify] INFO  proxy_settings = {'url': 'http://auto:*********@10.0.82.23:8011', 'username': 'auto', 'password': '*********'}
2023-12-15T15:10:21.218Z [scrapy.utils.log] INFO  Scrapy 2.11.0 started (bot: titlebot)
2023-12-15T15:10:21.222Z [scrapy.utils.log] INFO  Scrapy 2.11.0 started (bot: titlebot) ({"message": "Scrapy 2.11.0 started (bot: titlebot)"})
2023-12-15T15:10:21.224Z [scrapy.utils.log] INFO  Versions: lxml 4.9.3.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.11.6 (main, Nov 29 2023, 04:47:02) [GCC 10.2.1 20210110], pyOpenSSL 23.3.0 (OpenSSL 3.1.4 24 Oct 2023), cryptography 41.0.7, Platform Linux-6.1.52-71.125.amzn2023.x86_64-x86_64-with-glibc2.31
2023-12-15T15:10:21.227Z [scrapy.utils.log] INFO  Versions: lxml 4.9.3.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.11.6 (main, Nov 29 2023, 04:47:02) [GCC 10.2.1 20210110], pyOpenSSL 23.3.0 (OpenSSL 3.1.4 24 Oct 2023), cryptography 41.0.7, Platform Linux-6.1.52-71.125.amzn2023.x86_64-x86_64-with-glibc2.31 ({"message": "Versions: lxml 4.9.3.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.11.6 (main, Nov 29 2023, 04:47:02) [GCC 10.2.1 20210110], pyOpenSSL 23.3.0 (OpenSSL 3.1.4 24 Oct 2023), cryptography 41.0.7, Platform Linux-6.1.52-71.125.amzn2023.x86_64-x86_64-with-glibc2.31"})
2023-12-15T15:10:21.229Z [scrapy.addons] INFO  Enabled addons:
2023-12-15T15:10:21.237Z       [] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>"})
2023-12-15T15:10:21.242Z [scrapy.addons] INFO  Enabled addons:
2023-12-15T15:10:21.245Z       [] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>", "message": "Enabled addons:\n[]"})
2023-12-15T15:10:21.251Z [scrapy.extensions.telnet] INFO  Telnet Password: 0cf7f4b3360ba082
2023-12-15T15:10:21.254Z [scrapy.extensions.telnet] INFO  Telnet Password: 0cf7f4b3360ba082 ({"message": "Telnet Password: 0cf7f4b3360ba082"})
2023-12-15T15:10:21.333Z [scrapy.middleware] INFO  Enabled extensions:
2023-12-15T15:10:21.336Z       ['scrapy.extensions.corestats.CoreStats',
2023-12-15T15:10:21.338Z        'scrapy.extensions.telnet.TelnetConsole',
2023-12-15T15:10:21.341Z        'scrapy.extensions.memusage.MemoryUsage',
2023-12-15T15:10:21.344Z        'scrapy.extensions.logstats.LogStats'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>"})
2023-12-15T15:10:21.346Z [scrapy.middleware] INFO  Enabled extensions:
2023-12-15T15:10:21.348Z       ['scrapy.extensions.corestats.CoreStats',
2023-12-15T15:10:21.351Z        'scrapy.extensions.telnet.TelnetConsole',
2023-12-15T15:10:21.353Z        'scrapy.extensions.memusage.MemoryUsage',
2023-12-15T15:10:21.355Z        'scrapy.extensions.logstats.LogStats'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>", "message": "Enabled extensions:\n['scrapy.extensions.corestats.CoreStats',\n 'scrapy.extensions.telnet.TelnetConsole',\n 'scrapy.extensions.memusage.MemoryUsage',\n 'scrapy.extensions.logstats.LogStats']"})
2023-12-15T15:10:21.357Z [scrapy.crawler] INFO  Overridden settings:
2023-12-15T15:10:21.360Z       {'BOT_NAME': 'titlebot',
2023-12-15T15:10:21.362Z        'DEPTH_LIMIT': 1,
2023-12-15T15:10:21.364Z        'LOG_LEVEL': 'INFO',
2023-12-15T15:10:21.366Z        'NEWSPIDER_MODULE': 'src.spiders',
2023-12-15T15:10:21.368Z        'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
2023-12-15T15:10:21.371Z        'ROBOTSTXT_OBEY': True,
2023-12-15T15:10:21.373Z        'SCHEDULER': 'apify.scrapy.scheduler.ApifyScheduler',
2023-12-15T15:10:21.376Z        'SPIDER_MODULES': ['src.spiders']}
2023-12-15T15:10:21.378Z [scrapy.crawler] INFO  Overridden settings:
2023-12-15T15:10:21.381Z       {'BOT_NAME': 'titlebot',
2023-12-15T15:10:21.383Z        'DEPTH_LIMIT': 1,
2023-12-15T15:10:21.386Z        'LOG_LEVEL': 'INFO',
2023-12-15T15:10:21.388Z        'NEWSPIDER_MODULE': 'src.spiders',
2023-12-15T15:10:21.391Z        'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
2023-12-15T15:10:21.393Z        'ROBOTSTXT_OBEY': True,
2023-12-15T15:10:21.395Z        'SCHEDULER': 'apify.scrapy.scheduler.ApifyScheduler',
2023-12-15T15:10:21.398Z        'SPIDER_MODULES': ['src.spiders']} ({"message": "Overridden settings:\n{'BOT_NAME': 'titlebot',\n 'DEPTH_LIMIT': 1,\n 'LOG_LEVEL': 'INFO',\n 'NEWSPIDER_MODULE': 'src.spiders',\n 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',\n 'ROBOTSTXT_OBEY': True,\n 'SCHEDULER': 'apify.scrapy.scheduler.ApifyScheduler',\n 'SPIDER_MODULES': ['src.spiders']}"})
2023-12-15T15:10:21.435Z [scrapy.middleware] INFO  Enabled downloader middlewares:
2023-12-15T15:10:21.438Z       ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
2023-12-15T15:10:21.441Z        'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
2023-12-15T15:10:21.443Z        'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
2023-12-15T15:10:21.446Z        'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
2023-12-15T15:10:21.448Z        'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
2023-12-15T15:10:21.451Z        'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
2023-12-15T15:10:21.453Z        'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
2023-12-15T15:10:21.456Z        'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
2023-12-15T15:10:21.458Z        'scrapy.downloadermiddlewares.stats.DownloaderStats',
2023-12-15T15:10:21.460Z        'src.middlewares.ApifyHttpProxyMiddleware',
2023-12-15T15:10:21.462Z        'apify.scrapy.middlewares.ApifyRetryMiddleware'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>"})
2023-12-15T15:10:21.465Z [scrapy.middleware] INFO  Enabled downloader middlewares:
2023-12-15T15:10:21.467Z       ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
2023-12-15T15:10:21.469Z        'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
2023-12-15T15:10:21.472Z        'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
2023-12-15T15:10:21.474Z        'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
2023-12-15T15:10:21.476Z        'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
2023-12-15T15:10:21.479Z        'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
2023-12-15T15:10:21.481Z        'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
2023-12-15T15:10:21.484Z        'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
2023-12-15T15:10:21.486Z        'scrapy.downloadermiddlewares.stats.DownloaderStats',
2023-12-15T15:10:21.489Z        'src.middlewares.ApifyHttpProxyMiddleware',
2023-12-15T15:10:21.492Z        'apify.scrapy.middlewares.ApifyRetryMiddleware'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>", "message": "Enabled downloader middlewares:\n['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',\n 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',\n 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',\n 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',\n 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',\n 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',\n 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',\n 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',\n 'scrapy.downloadermiddlewares.stats.DownloaderStats',\n 'src.middlewares.ApifyHttpProxyMiddleware',\n 'apify.scrapy.middlewares.ApifyRetryMiddleware']"})
2023-12-15T15:10:21.495Z [scrapy.middleware] INFO  Enabled spider middlewares:
2023-12-15T15:10:21.497Z       ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
2023-12-15T15:10:21.500Z        'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
2023-12-15T15:10:21.502Z        'scrapy.spidermiddlewares.referer.RefererMiddleware',
2023-12-15T15:10:21.505Z        'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
2023-12-15T15:10:21.507Z        'scrapy.spidermiddlewares.depth.DepthMiddleware'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>"})
2023-12-15T15:10:21.510Z [scrapy.middleware] INFO  Enabled spider middlewares:
2023-12-15T15:10:21.514Z       ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
2023-12-15T15:10:21.516Z        'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
2023-12-15T15:10:21.519Z        'scrapy.spidermiddlewares.referer.RefererMiddleware',
2023-12-15T15:10:21.521Z        'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
2023-12-15T15:10:21.523Z        'scrapy.spidermiddlewares.depth.DepthMiddleware'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>", "message": "Enabled spider middlewares:\n['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',\n 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',\n 'scrapy.spidermiddlewares.referer.RefererMiddleware',\n 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',\n 'scrapy.spidermiddlewares.depth.DepthMiddleware']"})
2023-12-15T15:10:21.526Z [scrapy.middleware] INFO  Enabled item pipelines:
2023-12-15T15:10:21.528Z       ['src.pipelines.TitleItemPipeline',
2023-12-15T15:10:21.531Z        'apify.scrapy.pipelines.ActorDatasetPushPipeline'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>"})
2023-12-15T15:10:21.533Z [scrapy.middleware] INFO  Enabled item pipelines:
2023-12-15T15:10:21.535Z       ['src.pipelines.TitleItemPipeline',
2023-12-15T15:10:21.538Z        'apify.scrapy.pipelines.ActorDatasetPushPipeline'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>", "message": "Enabled item pipelines:\n['src.pipelines.TitleItemPipeline',\n 'apify.scrapy.pipelines.ActorDatasetPushPipeline']"})
2023-12-15T15:10:21.540Z [scrapy.core.engine] INFO  Spider opened ({"spider": "<TitleSpider 'title_spider' at 0x7f2e82e9b7d0>"})
2023-12-15T15:10:21.542Z [scrapy.core.engine] INFO  Spider opened ({"spider": "<TitleSpider 'title_spider' at 0x7f2e82e9b7d0>", "message": "Spider opened"})
2023-12-15T15:10:21.545Z [scrapy.extensions.logstats] INFO  Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) ({"spider": "<TitleSpider 'title_spider' at 0x7f2e82e9b7d0>"})
2023-12-15T15:10:21.547Z [scrapy.extensions.logstats] INFO  Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) ({"spider": "<TitleSpider 'title_spider' at 0x7f2e82e9b7d0>", "message": "Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)"})
2023-12-15T15:10:21.550Z [twisted] INFO  TelnetConsole starting on 6023
2023-12-15T15:10:21.552Z [twisted] INFO  TelnetConsole starting on 6023 ({"message": "TelnetConsole starting on 6023"})
2023-12-15T15:10:21.554Z [scrapy.extensions.telnet] INFO  Telnet console listening on 127.0.0.1:6023 ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>"})
2023-12-15T15:10:21.557Z [scrapy.extensions.telnet] INFO  Telnet console listening on 127.0.0.1:6023 ({"crawler": "<scrapy.crawler.Crawler object at 0x7f2e846aee10>", "message": "Telnet console listening on 127.0.0.1:6023"})
2023-12-15T15:10:22.709Z [title_spider] INFO  TitleSpider is parsing <200 https://apify.com/>... ({"spider": "<TitleSpider 'title_spider' at 0x7f2e82e9b7d0>"})
...