BruceJohnJennerLawso / scrap

Hockey stats analysis done by scraping the data to a csv file, then processing/analyzing them with more python.
3 stars 0 forks source link

Write a script that identifies bad pages in strobe uwaterloo #124

Open BruceJohnJennerLawso opened 7 years ago

BruceJohnJennerLawso commented 7 years ago

A lot of the pages in strobe uwaterloo are in a garbage state of quality, case in point,

https://strobe.uwaterloo.ca/athletics/intramurals/teams.php?team=1745

That being said, a decent fraction of the database is in wonderful shape (I would think at least 90-95% of the pages are exactly as expected), and the total number of "problem pages" is probably in the range of 50-200 total. Given that, it would be nice to have a script that rolls through the listed pages and flags any that have something unusual about them (less than 7 total games, missing data cells (bad row flagged in the getTableInRows() function, etc.)

This can probably just be a modification of the generalized scraper once its done.