amol-desai / BoundaryLine_public

Mozilla Public License 2.0
12 stars 11 forks source link

retrieve and decode trajectory strings #12

Closed bqkr closed 2 years ago

bqkr commented 2 years ago

A new function get_candidate_urls_hawkeye_trajectories(match_id)sets the URL for /uds/traj files, which contain ball-by-ball data including a Base64 trajectory string. A new function get_trajectories_df_from_matchid(match_id)retrieves the /uds/traj file, labels the columns, calculates the ball speed, and then decodes the Base64 string into 16 different elements.

Example Match IDs with trajectory data include 10204, 32242, 521. Example Match IDs that do not include trajectory data include 100, 10644 (which returns 'no data to fetch')

The column bp.x matches the column 'length', and bp.y matches the column 'line' from the uds/stats file (though with additional decimal points in bp.x and bp.y)

x,y,z components have been stored in separate columns, particularly to allow easy export to CSV.

The process of decoding the trajectory string is:

  1. x = QQli8b5Ng0E/Bh+OvmS98r7iL3HBGhL+wehQ6j4o+7rA3Z1CwdWflz6406tAkr0hvlRnPAAAAADBKZmaPTrN6gDBIO81ADEB
  2. base64.b64decode(x).hex() = 410962f1be4d83413f061f8ebe64bdf2bee22f71c11a12fec1e850ea3e28fbbac0dd9d42c1d59f973eb8d3ab4092bd21be54673c00000000c129999a3d3acdea00c120ef35003101 This matches the output from this converter https://base64.guru/converter/decode/hex
  3. bp.x is contained in the first four bytes, e.g. base64.b64decode(x)[0:4].hex() = '410962f1'
  4. This hex string encodes an IEEE754 big-endian float. struct.unpack('>f', '410962f1') = 8.58 This matches the output from https://babbage.cs.qc.cuny.edu/IEEE-754/