BruceJohnJennerLawso / scrap

Hockey stats analysis done by scraping the data to a csv file, then processing/analyzing them with more python.
3 stars 0 forks source link

Make the scraper mimic normal user to avoid IP block #109

Open BruceJohnJennerLawso opened 7 years ago

BruceJohnJennerLawso commented 7 years ago

Turns out hock-ref doesnt like being looked at too much, so I need to be careful to avoid getting my IP blocked. This is mostly not an issue at the moment, given that the team csvs for NHL and WHA are stored in the dataBackup, but it could become an issue in the future with day to day scraping of the current season scores.

This hypothetically could be avoided by modifying the scrapers loop to be random in terms of the times between requests and the order of teams requested. Even better, it hypothetically could mimic a user starting from the season page and jump to each team page, spend an appropriate amount of time looking at that team page, then move to the next one. This would extend the runtime of the scraper quite a bit, but that would probably not be that big of a deal for a server with nothing to do. Of course if anyone from hockey reference is reading this, this is all hypothetical, I would never do such a thing...