chychen / BasketballGAN

Basketball coaches often sketch plays on a whiteboard to help players get the ball through the net. A new AI model predicts how opponents would respond to these tactics.
https://arxiv.org/abs/1909.07088
58 stars 5 forks source link

Dataset documentation #1

Closed alexmonti19 closed 4 years ago

alexmonti19 commented 4 years ago

Hi, is there any documentation regarding the four numpy arrays we can download from the dataset link? It would help me understand how the data are stored/organized.

chychen commented 4 years ago

BasketballGAN dataset

Data processing

Project dataset only contains offensive set plays of 1 ball and 10 players (5 on offence and 5 on defence) [x,y] positions, starting when the ball is dribbled across or in-bounded from the half court, and ends when a shot is made or missed.

1) NBA play-by-play information is parsed along the tracking data to extract every shot made or missed in the player tracking data 2) Record when offence brings ball pass half-court or inbounds at half-court (scoring end) 3) End recorded segment when ball is shot by player (miss or made) 4) Down sample data to 5 frames per second. 5) We use Ramer-Douglas-Peuker algorithm to simply the real offense sequence as the conditions(sketches) fed into generator.

alexmonti19 commented 4 years ago

Hi, thanks for your response!

Project dataset only contains offensive set plays of 1 ball and 10 players (5 on offence and 5 on defence) [x,y] positions, starting when the ball is dribbled across or in-bounded from the half court, and ends when a shot is made or missed

Ok, now the shape of the first array makes sense :) '50Real.npy' -> 14k+ different sequences, 50 timesteps per sequence, and the position of the 10 players + the ball for every timestep

What about the other three arrays? In the Drive folder I also found '50Seq.npy' (14032x50x12), 'SeqReal.npy' (14032x50x6) and 'SeqCond.npy' (14032x50x6), and I'm struggling to understand what they may contain.

chychen commented 4 years ago

sorry for the bad naming styles. 50Real.npy -> real play 50Seq.npy -> real offensive strategies SeqReal.npy -> one hot format, ball status for real play SeqCond.npy -> one hot format, ball status for real offensive strategies

one hot format: (6 dims) 0-> offense player0 has the ball, 1-> offense player1 has the ball, ... 5 -> no one has the ball (shooting or passing)

alexmonti19 commented 4 years ago

SeqReal.npy -> one hot format, ball status for real play SeqCond.npy -> one hot format, ball status for real offensive strategies

one hot format: (6 dims) 0-> offense player0 has the ball, 1-> offense player1 has the ball, ... 5 -> no one has the ball (shooting or passing)

Perfect :)

50Seq.npy -> real offensive strategies

Mmh, so, for this one, what does the last dim (=12) represent?

chychen commented 4 years ago

ball (x,y) + 5 offensive players (x,y) = 12 + 52 = 12

alexmonti19 commented 4 years ago

Thank you again :)