Closed scientes closed 3 years ago
using the docker-compose to pass variables should print logs from scrapy
running it using
docker build .
Works
But docker-compose up Produces no result
i found a few issues i have a fix for
i found a few issues i have a fix for
Issues while running docker-compose or issues in general?
in general docker and docker compose seems working but the scrapy stuff does weird things
Could you share some logs?
Scrapy somehow crashes the whole process
2021-01-05 09:06:58 [scrapy.extensions.throttle] INFO: slot: www.howstat.com | conc: 1 | delay: 1000 ms (+0) | latency: 892 ms | size: 1990 bytes
2021-01-05 09:06:58 [scrapy.core.engine] DEBUG: Crawled (500) <GET http://www.howstat.com/cricket/Statistics/Players/PlayerProgressSummary.asp?PlayerID=4644> (referer: http://www.howstat.com/cricket/Statistics/Players/PlayerListCurrent.asp)
Killed
[2021-01-05 09:07:00 +0000] [66] [INFO] Started server process [66]
[2021-01-05 09:07:00 +0000] [66] [INFO] Waiting for application startup.
[2021-01-05 09:07:00 +0000] [66] [INFO] Application startup complete.
``
a more recent one (note ist not th whole log)
2021-01-05 09:15:08 [scrapy.extensions.throttle] INFO: slot: www.howstat.com | conc: 1 | delay: 3640 ms (+81) | latency: 3640 ms | size: 92507 bytes
2021-01-05 09:15:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.howstat.com/cricket/Statistics/Matches/MatchScorecard.asp?MatchCode=2385> (referer: http://www.howstat.com/cricket/Statistics/Players/PlayerProgressSummary.asp?PlayerID=4701)
2021-01-05 09:15:08 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.howstat.com/cricket/Statistics/Matches/MatchScorecard.asp?MatchCode=2385>
None
2021-01-05 09:15:10 [scrapy.extensions.throttle] INFO: slot: www.howstat.com | conc: 1 | delay: 2595 ms (-1045) | latency: 1549 ms | size:113603 bytes
2021-01-05 09:15:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.howstat.com/cricket/Statistics/Matches/MatchScorecard.asp?MatchCode=2384> (referer: http://www.howstat.com/cricket/Statistics/Players/PlayerProgressSummary.asp?PlayerID=4701)
2021-01-05 09:15:10 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.howstat.com/cricket/Statistics/Matches/MatchScorecard.asp?MatchCode=2384>
None
2021-01-05 09:15:10 [scrapy.core.engine] INFO: Closing spider (finished)
2021-01-05 09:15:10 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 35484,
'downloader/request_count': 84,
'downloader/request_method_count/GET': 84,
'downloader/response_bytes': 9477753,
'downloader/response_count': 84,
'downloader/response_status_count/200': 79,
'downloader/response_status_count/500': 5,
'elapsed_time_seconds': 158.293703,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 1, 5, 9, 15, 10, 241229),
'httpcache/firsthand': 61,
'httpcache/hit': 23,
'httpcache/miss': 61,
'httpcache/store': 56,
'httpcache/uncacheable': 5,
'httperror/response_ignored_count': 5,
'httperror/response_ignored_status_count/500': 5,
'item_scraped_count': 60,
'log_count/DEBUG': 145,
'log_count/INFO': 78,
'memusage/max': 75239424,
'memusage/startup': 56569856,
'request_depth_max': 2,
'response_received_count': 84,
'robotstxt/request_count': 1,
'robotstxt/response_count': 1,
'robotstxt/response_status_count/200': 1,
'scheduler/dequeued': 83,
'scheduler/dequeued/memory': 83,
'scheduler/enqueued': 83,
'scheduler/enqueued/memory': 83,
'start_time': datetime.datetime(2021, 1, 5, 9, 12, 31, 947526)}
2021-01-05 09:15:10 [scrapy.core.engine] INFO: Spider closed (finished)
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
and after that i have a connection reset
getting the actual file manually doesnt fail but somehow the crawler exits and then the process crashes
Also: https://github.com/scientes/Best11-Fantasycricket/pull/5/commits needs to be merge before merging
Exception ignored in: <_io.TextIOWrapper name='
' mode='w' encoding='utf-8'> BrokenPipeError: [Errno 32] Broken pipe
I cant reproduce this error Although I did get this error intially but that was because my scrapy pipeline had failed You're scrapy pipeline seems like a success , possibly a error while reading the json file in results
Also: https://github.com/scientes/Best11-Fantasycricket/pull/5/commits needs to be merge before merging
yes but its not on this patch
oops!
Sorry!! Seems like I merged it by mistake I have reverted it
no problem
ill just make a new merge request
Description
added dockerfiles
Fixes #39