Closed jamiemoran2 closed 5 months ago
Hey @jamiemoran2. The issue is related to the cs-CZ locale. I've written the parser, and now it should work. Please run pip install --upgrade volleystats
and test it again.
Hey thanks for the quick fix. The match data is working fine but the comp data is showing a date error.
volleystats --fed cvf --comp 37 --log
2024-01-08 03:00:51 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: scrapybot) 2024-01-08 03:00:51 [scrapy.utils.log] INFO: Versions: lxml 4.9.4.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0], pyOpenSSL 23.3.0 (OpenSSL 3.1.4 24 Oct 2023), cryptography 41.0.7, Platform Linux-6.1.58+-x86_64-with-glibc2.35
volleystats: started 2024-01-08 03:00:51 [scrapy.addons] INFO: Enabled addons: [] /usr/local/lib/python3.10/dist-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.
It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.
See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation.
return cls(crawler)
2024-01-08 03:00:51 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2024-01-08 03:00:51 [scrapy.extensions.telnet] INFO: Telnet Password: 7bd54a76b4e64de0
2024-01-08 03:00:51 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2024-01-08 03:00:51 [scrapy.crawler] INFO: Overridden settings:
{}
2024-01-08 03:00:51 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2024-01-08 03:00:51 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2024-01-08 03:00:51 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2024-01-08 03:00:51 [scrapy.core.engine] INFO: Spider opened
2024-01-08 03:00:51 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-01-08 03:00:51 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-01-08 03:00:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cvf-web.dataproject.com/CompetitionMatches.aspx?ID=37> (referer: None)
2024-01-08 03:00:55 [scrapy.core.scraper] ERROR: Spider error processing <GET https://cvf-web.dataproject.com/CompetitionMatches.aspx?ID=37> (referer: None)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 279, in iter_errback
yield next(it)
File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next
return next(self.data)
File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next
return next(self.data)
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/offsite.py", line 28, in
One suggestion though I have no idea if and how it can be implemented, I'm a noob. These sites have an option to change locale from the top right corner somewhere and most have English locale in them. So what if the locale is changed to English first and then the scraper would work for all sites.
One suggestion though I have no idea if and how it can be implemented, I'm a noob. These sites have an option to change locale from the top right corner somewhere and most have English locale in them. So what if the locale is changed to English first and then the scraper would work for all sites.
Thank you for the suggestion. It was just a matter of setting the cookies to en-GB.
The issue with the comp
was regarding date parsing (which I do to standardize the data), where some places use /
and others use .
, as is the case with CVF. Run pip install --upgrade volleystats
and test it again.
The next update will allow using the generated comp
file to perform scraping on each match in the list ;)
When i run the command it generates an empty csv file-
volleystats -f cvf -m 2999 -l
2024-01-06 07:04:26 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: scrapybot) 2024-01-06 07:04:26 [scrapy.utils.log] INFO: Versions: lxml 4.9.4.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0], pyOpenSSL 23.3.0 (OpenSSL 3.1.4 24 Oct 2023), cryptography 41.0.7, Platform Linux-6.1.58+-x86_64-with-glibc2.35
volleystats: started 2024-01-06 07:04:26 [scrapy.addons] INFO: Enabled addons: [] /usr/local/lib/python3.10/dist-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.
It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.
See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation. return cls(crawler) 2024-01-06 07:04:26 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor 2024-01-06 07:04:26 [scrapy.extensions.telnet] INFO: Telnet Password: efc0de33453964cb 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2024-01-06 07:04:26 [scrapy.crawler] INFO: Overridden settings: {} 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled item pipelines: [] 2024-01-06 07:04:26 [scrapy.core.engine] INFO: Spider opened 2024-01-06 07:04:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-01-06 07:04:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2024-01-06 07:04:26 [scrapy.addons] INFO: Enabled addons: [] /usr/local/lib/python3.10/dist-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.
It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.
See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation. return cls(crawler) 2024-01-06 07:04:26 [scrapy.extensions.telnet] INFO: Telnet Password: ad8747aaaecda7e6 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2024-01-06 07:04:26 [scrapy.crawler] INFO: Overridden settings: {} 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled item pipelines: [] 2024-01-06 07:04:26 [scrapy.core.engine] INFO: Spider opened 2024-01-06 07:04:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-01-06 07:04:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 2024-01-06 07:04:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> from <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999> 2024-01-06 07:04:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> from <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999> 2024-01-06 07:04:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> (referer: None) 2024-01-06 07:04:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> (referer: None) 2024-01-06 07:04:28 [scrapy.core.scraper] ERROR: Spider error processing <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> (referer: None) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 279, in iter_errback yield next(it) File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next return next(self.data) File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next return next(self.data) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/offsite.py", line 28, in
return (r for r in result or () if self._filter(r, spider))
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/referer.py", line 352, in
return (self._set_referer(r, response) for r in result or ())
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/urllength.py", line 27, in
return (r for r in result or () if self._filter(r, spider))
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/depth.py", line 31, in
return (r for r in result or () if self._filter(r, response, spider))
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/usr/local/lib/python3.10/dist-packages/volleystats/spiders/match.py", line 103, in parse
'Match Date': match_date,
UnboundLocalError: local variable 'match_date' referenced before assignment
2024-01-06 07:04:28 [scrapy.core.scraper] ERROR: Spider error processing <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> (referer: None)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 279, in iter_errback
yield next(it)
File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next
return next(self.data)
File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next
return next(self.data)
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/offsite.py", line 28, in
return (r for r in result or () if self._filter(r, spider))
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/referer.py", line 352, in
return (self._set_referer(r, response) for r in result or ())
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/urllength.py", line 27, in
return (r for r in result or () if self._filter(r, spider))
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/depth.py", line 31, in
return (r for r in result or () if self._filter(r, response, spider))
File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/usr/local/lib/python3.10/dist-packages/volleystats/spiders/match.py", line 44, in parse
'Match Date': match_date,
UnboundLocalError: local variable 'match_date' referenced before assignment
2024-01-06 07:04:28 [scrapy.core.engine] INFO: Closing spider (finished)
2024-01-06 07:04:28 [scrapy.utils.signal] ERROR: Error caught on signal handler: <function Spider.close at 0x7f62cd3b9900>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 348, in maybeDeferred_coro
result = f(*args, kw)
File "/usr/local/lib/python3.10/dist-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, *named)
File "/usr/local/lib/python3.10/dist-packages/scrapy/spiders/init.py", line 98, in close
return cast(Union[Deferred, None], closed(reason))
File "/usr/local/lib/python3.10/dist-packages/volleystats/spiders/match.py", line 117, in closed
dst = f'data/{spider.match_id}-{spider.match_date}-guest-{spider.guest_team}.csv'
AttributeError: 'GuestStatsSpider' object has no attribute 'match_date'
2024-01-06 07:04:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (0 items) in: data/guest_stats.csv
2024-01-06 07:04:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 714,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 111820,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 1,
'elapsed_time_seconds': 2.29563,
'feedexport/success_count/FileFeedStorage': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2024, 1, 6, 7, 4, 28, 773800, tzinfo=datetime.timezone.utc),
'httpcompression/response_bytes': 1563883,
'httpcompression/response_count': 1,
'log_count/DEBUG': 5,
'log_count/ERROR': 3,
'log_count/INFO': 21,
'memusage/max': 111345664,
'memusage/startup': 111345664,
'response_received_count': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'spider_exceptions/UnboundLocalError': 1,
'start_time': datetime.datetime(2024, 1, 6, 7, 4, 26, 478170, tzinfo=datetime.timezone.utc)}
2024-01-06 07:04:28 [scrapy.core.engine] INFO: Spider closed (finished)
2024-01-06 07:04:28 [scrapy.core.engine] INFO: Closing spider (finished)
2024-01-06 07:04:28 [scrapy.utils.signal] ERROR: Error caught on signal handler: <function Spider.close at 0x7f62cd3b9900>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 348, in maybeDeferred_coro
result = f(args, kw)
File "/usr/local/lib/python3.10/dist-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/usr/local/lib/python3.10/dist-packages/scrapy/spiders/init.py", line 98, in close
return cast(Union[Deferred, None], closed(reason))
File "/usr/local/lib/python3.10/dist-packages/volleystats/spiders/match.py", line 58, in closed
dst = f'data/{spider.match_id}-{spider.match_date}-home-{spider.home_team}.csv'
AttributeError: 'HomeStatsSpider' object has no attribute 'match_date'
2024-01-06 07:04:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (0 items) in: data/home_stats.csv
2024-01-06 07:04:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 714,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 111806,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 1,
'elapsed_time_seconds': 2.291011,
'feedexport/success_count/FileFeedStorage': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2024, 1, 6, 7, 4, 28, 779753, tzinfo=datetime.timezone.utc),
'httpcompression/response_bytes': 1563883,
'httpcompression/response_count': 1,
'log_count/DEBUG': 4,
'log_count/ERROR': 4,
'log_count/INFO': 15,
'memusage/max': 111345664,
'memusage/startup': 111345664,
'response_received_count': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'spider_exceptions/UnboundLocalError': 1,
'start_time': datetime.datetime(2024, 1, 6, 7, 4, 26, 488742, tzinfo=datetime.timezone.utc)}
2024-01-06 07:04:28 [scrapy.core.engine] INFO: Spider closed (finished)
volleystats: finished