Closed roysti10 closed 3 years ago
Issue-Label Bot is automatically applying the label bug
to this issue, with a confidence of 0.97. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
ehm i might have to shed some light here as well: yielding items does not decreaese Performance. requesting pages does. i just found a neat hack for our problem:
http://www.howstat.com/cricket/Statistics/Matches/MatchList_T20.asp?Group=2017010130001231
the url encodes the range of matches we need: so from jan1 2017 - 31.Dec 3000 = 2017010130001231
So we only need to crawl these links: http://www.howstat.com/cricket/Statistics/Matches/MatchList_T20.asp?Group=2017010130001231 http://www.howstat.com/cricket/Statistics/Matches/MatchList_ODI.asp?Group=2017010130001231 http://www.howstat.com/cricket/Statistics/Matches/MatchList.asp?Group=2017010130001231 http://www.howstat.com/cricket/Statistics/IPL/MatchList.asp?Group=2017010130001231
ehm i might have to shed some light here as well: yielding items does not decreaese Performance. requesting pages does. i just found a neat hack for our problem:
http://www.howstat.com/cricket/Statistics/Matches/MatchList_T20.asp?Group=2017010130001231
the url encodes the range of matches we need: so from jan1 2017 - 31.Dec 3000 = 2017010130001231
So we only need to crawl these links: http://www.howstat.com/cricket/Statistics/Matches/MatchList_T20.asp?Group=2017010130001231 http://www.howstat.com/cricket/Statistics/Matches/MatchList_ODI.asp?Group=2017010130001231 http://www.howstat.com/cricket/Statistics/Matches/MatchList.asp?Group=2017010130001231 http://www.howstat.com/cricket/Statistics/IPL/MatchList.asp?Group=2017010130001231
This is an amazing hack. This would reduce the searching by a lot. Thanks I'll implement this soon once I'm free
Im not too sure if this will be needed I am adding a wontfix label for now, until its figured out
Describe the bug The web crawler in
feature-crawler
takes in match records from the 1900s . This wastes a lot of time and reduces efficiency of the crawler To Reproduce Steps to reproduce the behavior:Expected behavior The solution to this would be to set a filter which takes match records only from the year 2017 and greater Possible solution in
cralwer/cricketcrawler/spiders/howstat.py
in functionparse_scorecard
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context The starting point to this might be
crawler/cricketcrawler/spiders/howstat.py