Open jbwilkes opened 4 years ago
Include the total size of the dataframe from one game, so that for any nth turn slice, we know how long the game went, and for the nth turn we could know what percentage of the game has been done. We wouldn't give that information to the model but it might be interesting for analysis
By Data descriptions - i mean just a few features that describes the game, aka (the name of the file, the number of players, which AI were in the game, how many turns did the game go, and the winner),
The importance of this is so we can see if there are any biases in how we generated data. It'll be an easy way to say which AI is the best, and most importantly... we need a way to segment the data for when we want to predict upon the nth turn. Thus we can get the files that have at least n turns, and include that data in the model (test / train split).
Eventually all the data from games of similar type (same number of players) will be put into one dataset so that we can take a subset of it to have the model predict on.
So this will be a dataframe that just has one row per game.
I can do this if no-one beats me to it.
Having this done might be helpful for the "Data Visualization / Peer Review" might be nice so we can at least describe the main characteristics of our data.