BruceJohnJennerLawso / scrap

Hockey stats analysis done by scraping the data to a csv file, then processing/analyzing them with more python.
3 stars 0 forks source link

Hash player emails, & write to main csvs #141

Open BruceJohnJennerLawso opened 7 years ago

BruceJohnJennerLawso commented 7 years ago

Ok, so the script to pull down listed emails (turns out its not just uwaterloo emails, that part of the panel was customizeable), hash them into an id, and place that in the teams csv in a column before the players name.

For players without a listed email, we still need to hash the name so its not visible, but determining if a player is the same person becomes much more complicated when we dont have an exact confirmation that some given player name on team A really is the same person as the one playing on team B.

However, I do have the fact that some person who has their email unlisted isnt super likely to happen with two different people, and the total number of people doing that is relatively low (usually 1-2 per semester, sometimes less). I think it should be entirely reasonable to hash the full name, append on the teamId, and then I can do an eye test for the repeats, before dropping the teamId from the hash in the team csv.

For indies teams, we have an exasperating thing where the team roster as accessed from the strobe panel lists a fictional player called Free Agent (special free agent captain) with no email obviously, but I think we can sidestep this problem by only populating the names that are in the roster in the main csv with the hashes generated in the teams RosterData csv (the free agent captains are already a false for emailFound, so they should be ignored)

The last issue that might throw a wrench into things is the teams with empty rosters, (need an example here), which I believe get populated with a single Free Agent (special free agent captain), but those are a lost cause anyways given that their rosters are gone completely, so theyre going to screw things up no matter what I do.

Ideally Id like the hashes to be somewhat human-readable/memorable, so instead of

"John Doe" -> "e7fx8yuo"

id prefer to find a method that produces something more like

"John Doe" -> "ExplosiveExplanation"