Output format of prediction

PokemonGoers / PredictPokemon-2

In this project we will apply machine learning to establish the TLN (Time, Location and Name - that is where pokemons will appear, at what date and time, and which Pokemon will it be) prediction in Pokemon Go.

Apache License 2.0

9 stars 3 forks source link

Output format of prediction #61

Closed MatthiasBaur closed 7 years ago

MatthiasBaur commented 7 years ago

We will get a user location as input for our classifier. We will then build a grid around the user - for example 5*5 with a edge length of 250m and make a prediction for each one of the points. So one user request gets 25 predictions. We have to come up with the format for the Map Team. The returning data should contain: 25 times (latitude, longitude, PredictedPokemonID, confidence). Points to clear up:

[ ] Edge length
[ ] Number of points (probably 25)
[ ] Should we give the Map Team the prediction rates (confidence) or do we just chose ourselves which rates are good enough and return less than 25 points?

goldbergtatyana commented 7 years ago

I'd suggest to extend the grid to 9*9 with an edge length of 250m. This way we will predict pokemons within a radius of 1km. The number of points we will provide to the Map Team will then be 81.

Please note that once a user will come close to the edge of a grid, a new grid will need to be defined with a new center (current position of a user) and predictions for new 81 points will need to be made.

bensLine commented 7 years ago

@goldbergtatyana can we work with S2 cell ids instead? Since we use cell ids internally the 9*9 grid might not be consistent with the cells. E.g. 2 grids could map to the same cell if we just create the grid from the users location. If we use the center of the cell in which the user is located we could still end up with duplicated cells since the edge length of the cell is not guarantied to 250x250m.

If we just use s2 cells instead we won't have nice numbers, like 9x9 with 250 edge length. but we won't have overlapping cells (and possibly less duplicated predictions) We'd then return a prediction for every cell center and the cell length would be about 365m or 182m, depending on the S2 cell level we choose.

MatthiasBaur commented 7 years ago

The s2 cells are out of the features (although we might need them internally). I'm sorry, I didn't have time to write it down this weekend. I will post the feature list in a couple of minutes. The advantage of making 81 or so predictions is that we can implement a probability threshold. So say there are 81 points. If we throw out all points that have a low probability of being predicted correctly, we are left with 5-10 points, which can be displayed nicely.

goldbergtatyana commented 7 years ago

@bensLine what is the largesr resolution of c2 cells, i.e. what is the minimum length of the edges of a s2 cell? Inprinciple it doesnt matter if we use a defined 9*9 grid or s2 cells, important is that we provide predictions for locations at a reasonable distance from each other and cover an area of around 2km x 2km

bensLine commented 7 years ago

@goldbergtatyana Here is a slide from an official S2 presentation with the details

So, I guess we just play around with the cell level and see what works best for the 2km area.

goldbergtatyana commented 7 years ago

I am all for it, thanks @bensLine ! 👍

semioniy commented 7 years ago

Sorry, I don't quite get it - for which time do we make predictions? I mean, which time do we use in queries? Are these supposed to be based on minutes? Then for next 15 mins timeframe and +-80 cells around a player we make 15*80=1200 queries, right?

bensLine commented 7 years ago

The way I understood it is that a user on the website triggers a prediction query. Therefore we use the current map location on the website (center) and the current time. Out of that data, we create the necessary features (like time of the day (morning, evening, ...)) so that we have basically an entry of the data set, which only misses the class label, which we'll predict. we'll do this then for around 80 cells, as you say, but only for one timestamp. The results will then be displayed on the map until the user refreshes the site, triggers a new request or maybe the map team implements a timeout and requests after 5min new predictions...

semioniy commented 7 years ago

I think for, say, 5 upcoming minutes there has to be done 5 queries - one for each minute, right?

goldbergtatyana commented 7 years ago

Nah, if we stay with a 5 minutes interval, then we make a prediction just once, i.e. for the next five minutes. I am ok with increasing the interval to 10 or even 15 minutes. The choice is yours :)

semioniy commented 7 years ago

Hey, @goldbergtatyana, die we already decide which 10k we use to create arff and how often we do this? I can't find the info. Thanx.

goldbergtatyana commented 7 years ago

Hey @semioniy , I discussed that issue with @MatthiasBaur and we said that the 10K points will be taken randomly from the data points collected from the last 24 hours. We will generate a new training set every 15 minutes.

bensLine commented 7 years ago

predictions are provided as array of the following objects: {"pokemonId":"16","confidence":"0.242","latitude":11.6088567,"longitude":48.1679286}