claromes / volleystats

🏐 Command-line tool to scrape volleyball statistics from Data Project Web Competition websites
https://pypi.org/project/volleystats
GNU General Public License v3.0
9 stars 0 forks source link

Unable to scrape data - generates an empty csv file #17

Closed jamiemoran2 closed 5 months ago

jamiemoran2 commented 6 months ago

When i run the command it generates an empty csv file-

volleystats -f cvf -m 2999 -l

2024-01-06 07:04:26 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: scrapybot) 2024-01-06 07:04:26 [scrapy.utils.log] INFO: Versions: lxml 4.9.4.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0], pyOpenSSL 23.3.0 (OpenSSL 3.1.4 24 Oct 2023), cryptography 41.0.7, Platform Linux-6.1.58+-x86_64-with-glibc2.35

volleystats: started 2024-01-06 07:04:26 [scrapy.addons] INFO: Enabled addons: [] /usr/local/lib/python3.10/dist-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.

It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.

See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation. return cls(crawler) 2024-01-06 07:04:26 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor 2024-01-06 07:04:26 [scrapy.extensions.telnet] INFO: Telnet Password: efc0de33453964cb 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2024-01-06 07:04:26 [scrapy.crawler] INFO: Overridden settings: {} 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled item pipelines: [] 2024-01-06 07:04:26 [scrapy.core.engine] INFO: Spider opened 2024-01-06 07:04:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-01-06 07:04:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2024-01-06 07:04:26 [scrapy.addons] INFO: Enabled addons: [] /usr/local/lib/python3.10/dist-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.

It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.

See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation. return cls(crawler) 2024-01-06 07:04:26 [scrapy.extensions.telnet] INFO: Telnet Password: ad8747aaaecda7e6 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2024-01-06 07:04:26 [scrapy.crawler] INFO: Overridden settings: {} 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2024-01-06 07:04:26 [scrapy.middleware] INFO: Enabled item pipelines: [] 2024-01-06 07:04:26 [scrapy.core.engine] INFO: Spider opened 2024-01-06 07:04:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-01-06 07:04:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 2024-01-06 07:04:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> from <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999> 2024-01-06 07:04:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> from <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999> 2024-01-06 07:04:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> (referer: None) 2024-01-06 07:04:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> (referer: None) 2024-01-06 07:04:28 [scrapy.core.scraper] ERROR: Spider error processing <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> (referer: None) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 279, in iter_errback yield next(it) File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next return next(self.data) File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next return next(self.data) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/offsite.py", line 28, in return (r for r in result or () if self._filter(r, spider)) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/referer.py", line 352, in return (self._set_referer(r, response) for r in result or ()) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/urllength.py", line 27, in return (r for r in result or () if self._filter(r, spider)) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/depth.py", line 31, in return (r for r in result or () if self._filter(r, response, spider)) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/volleystats/spiders/match.py", line 103, in parse 'Match Date': match_date, UnboundLocalError: local variable 'match_date' referenced before assignment 2024-01-06 07:04:28 [scrapy.core.scraper] ERROR: Spider error processing <GET https://cvf-web.dataproject.com/MatchStatistics.aspx?mID=2999&ID=34> (referer: None) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 279, in iter_errback yield next(it) File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next return next(self.data) File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next return next(self.data) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/offsite.py", line 28, in return (r for r in result or () if self._filter(r, spider)) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/referer.py", line 352, in return (self._set_referer(r, response) for r in result or ()) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/urllength.py", line 27, in return (r for r in result or () if self._filter(r, spider)) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/depth.py", line 31, in return (r for r in result or () if self._filter(r, response, spider)) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/volleystats/spiders/match.py", line 44, in parse 'Match Date': match_date, UnboundLocalError: local variable 'match_date' referenced before assignment 2024-01-06 07:04:28 [scrapy.core.engine] INFO: Closing spider (finished) 2024-01-06 07:04:28 [scrapy.utils.signal] ERROR: Error caught on signal handler: <function Spider.close at 0x7f62cd3b9900> Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 348, in maybeDeferred_coro result = f(*args, kw) File "/usr/local/lib/python3.10/dist-packages/pydispatch/robustapply.py", line 55, in robustApply return receiver(*arguments, *named) File "/usr/local/lib/python3.10/dist-packages/scrapy/spiders/init.py", line 98, in close return cast(Union[Deferred, None], closed(reason)) File "/usr/local/lib/python3.10/dist-packages/volleystats/spiders/match.py", line 117, in closed dst = f'data/{spider.match_id}-{spider.match_date}-guest-{spider.guest_team}.csv' AttributeError: 'GuestStatsSpider' object has no attribute 'match_date' 2024-01-06 07:04:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (0 items) in: data/guest_stats.csv 2024-01-06 07:04:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 714, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 111820, 'downloader/response_count': 2, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/302': 1, 'elapsed_time_seconds': 2.29563, 'feedexport/success_count/FileFeedStorage': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2024, 1, 6, 7, 4, 28, 773800, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 1563883, 'httpcompression/response_count': 1, 'log_count/DEBUG': 5, 'log_count/ERROR': 3, 'log_count/INFO': 21, 'memusage/max': 111345664, 'memusage/startup': 111345664, 'response_received_count': 1, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'spider_exceptions/UnboundLocalError': 1, 'start_time': datetime.datetime(2024, 1, 6, 7, 4, 26, 478170, tzinfo=datetime.timezone.utc)} 2024-01-06 07:04:28 [scrapy.core.engine] INFO: Spider closed (finished) 2024-01-06 07:04:28 [scrapy.core.engine] INFO: Closing spider (finished) 2024-01-06 07:04:28 [scrapy.utils.signal] ERROR: Error caught on signal handler: <function Spider.close at 0x7f62cd3b9900> Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 348, in maybeDeferred_coro result = f(args, kw) File "/usr/local/lib/python3.10/dist-packages/pydispatch/robustapply.py", line 55, in robustApply return receiver(*arguments, **named) File "/usr/local/lib/python3.10/dist-packages/scrapy/spiders/init.py", line 98, in close return cast(Union[Deferred, None], closed(reason)) File "/usr/local/lib/python3.10/dist-packages/volleystats/spiders/match.py", line 58, in closed dst = f'data/{spider.match_id}-{spider.match_date}-home-{spider.home_team}.csv' AttributeError: 'HomeStatsSpider' object has no attribute 'match_date' 2024-01-06 07:04:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (0 items) in: data/home_stats.csv 2024-01-06 07:04:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 714, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 111806, 'downloader/response_count': 2, 'downloader/response_status_count/200': 1, 'downloader/response_status_count/302': 1, 'elapsed_time_seconds': 2.291011, 'feedexport/success_count/FileFeedStorage': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2024, 1, 6, 7, 4, 28, 779753, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 1563883, 'httpcompression/response_count': 1, 'log_count/DEBUG': 4, 'log_count/ERROR': 4, 'log_count/INFO': 15, 'memusage/max': 111345664, 'memusage/startup': 111345664, 'response_received_count': 1, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'spider_exceptions/UnboundLocalError': 1, 'start_time': datetime.datetime(2024, 1, 6, 7, 4, 26, 488742, tzinfo=datetime.timezone.utc)} 2024-01-06 07:04:28 [scrapy.core.engine] INFO: Spider closed (finished) volleystats: finished

claromes commented 5 months ago

Hey @jamiemoran2. The issue is related to the cs-CZ locale. I've written the parser, and now it should work. Please run pip install --upgrade volleystats and test it again.

jamiemoran2 commented 5 months ago

Hey thanks for the quick fix. The match data is working fine but the comp data is showing a date error. volleystats --fed cvf --comp 37 --log

2024-01-08 03:00:51 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: scrapybot) 2024-01-08 03:00:51 [scrapy.utils.log] INFO: Versions: lxml 4.9.4.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0], pyOpenSSL 23.3.0 (OpenSSL 3.1.4 24 Oct 2023), cryptography 41.0.7, Platform Linux-6.1.58+-x86_64-with-glibc2.35

volleystats: started 2024-01-08 03:00:51 [scrapy.addons] INFO: Enabled addons: [] /usr/local/lib/python3.10/dist-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.

It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.

See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation. return cls(crawler) 2024-01-08 03:00:51 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor 2024-01-08 03:00:51 [scrapy.extensions.telnet] INFO: Telnet Password: 7bd54a76b4e64de0 2024-01-08 03:00:51 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2024-01-08 03:00:51 [scrapy.crawler] INFO: Overridden settings: {} 2024-01-08 03:00:51 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2024-01-08 03:00:51 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2024-01-08 03:00:51 [scrapy.middleware] INFO: Enabled item pipelines: [] 2024-01-08 03:00:51 [scrapy.core.engine] INFO: Spider opened 2024-01-08 03:00:51 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-01-08 03:00:51 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2024-01-08 03:00:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cvf-web.dataproject.com/CompetitionMatches.aspx?ID=37> (referer: None) 2024-01-08 03:00:55 [scrapy.core.scraper] ERROR: Spider error processing <GET https://cvf-web.dataproject.com/CompetitionMatches.aspx?ID=37> (referer: None) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/defer.py", line 279, in iter_errback yield next(it) File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next return next(self.data) File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/python.py", line 350, in next return next(self.data) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/offsite.py", line 28, in return (r for r in result or () if self._filter(r, spider)) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/referer.py", line 352, in return (self._set_referer(r, response) for r in result or ()) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/urllength.py", line 27, in return (r for r in result or () if self._filter(r, spider)) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/scrapy/spidermiddlewares/depth.py", line 31, in return (r for r in result or () if self._filter(r, response, spider)) File "/usr/local/lib/python3.10/dist-packages/scrapy/core/spidermw.py", line 106, in process_sync for r in iterable: File "/usr/local/lib/python3.10/dist-packages/volleystats/spiders/competition.py", line 37, in parse match_date = parse_short_date(match_date_text) File "/usr/local/lib/python3.10/dist-packages/volleystats/utils.py", line 8, in parse_short_date short_date_obj = datetime.strptime(short_date_string, "%d/%m/%Y - %H:%M") File "/usr/lib/python3.10/_strptime.py", line 568, in _strptime_datetime tt, fraction, gmtoff_fraction = _strptime(data_string, format) File "/usr/lib/python3.10/_strptime.py", line 349, in _strptime raise ValueError("time data %r does not match format %r" % ValueError: time data '24.10.2023 - 19:00' does not match format '%d/%m/%Y - %H:%M' 2024-01-08 03:00:55 [scrapy.core.engine] INFO: Closing spider (finished) volleystats: data/37-cvf---competition_matches.csv file was created 2024-01-08 03:00:55 [scrapy.extensions.feedexport] INFO: Stored csv feed (0 items) in: data/competition_matches.csv 2024-01-08 03:00:55 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 253, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 322597, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 4.058435, 'feedexport/success_count/FileFeedStorage': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2024, 1, 8, 3, 0, 55, 412170, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 2122365, 'httpcompression/response_count': 1, 'log_count/DEBUG': 2, 'log_count/ERROR': 1, 'log_count/INFO': 11, 'memusage/max': 111181824, 'memusage/startup': 111181824, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'spider_exceptions/ValueError': 1, 'start_time': datetime.datetime(2024, 1, 8, 3, 0, 51, 353735, tzinfo=datetime.timezone.utc)} 2024-01-08 03:00:55 [scrapy.core.engine] INFO: Spider closed (finished) volleystats: finished

jamiemoran2 commented 5 months ago

One suggestion though I have no idea if and how it can be implemented, I'm a noob. These sites have an option to change locale from the top right corner somewhere and most have English locale in them. So what if the locale is changed to English first and then the scraper would work for all sites.

claromes commented 5 months ago

One suggestion though I have no idea if and how it can be implemented, I'm a noob. These sites have an option to change locale from the top right corner somewhere and most have English locale in them. So what if the locale is changed to English first and then the scraper would work for all sites.

Thank you for the suggestion. It was just a matter of setting the cookies to en-GB.

The issue with the comp was regarding date parsing (which I do to standardize the data), where some places use / and others use ., as is the case with CVF. Run pip install --upgrade volleystats and test it again.

The next update will allow using the generated comp file to perform scraping on each match in the list ;)