Closed MatthiasBaur closed 7 years ago
I'd suggest to extend the grid to 9*9 with an edge length of 250m. This way we will predict pokemons within a radius of 1km. The number of points we will provide to the Map Team will then be 81.
Please note that once a user will come close to the edge of a grid, a new grid will need to be defined with a new center (current position of a user) and predictions for new 81 points will need to be made.
@goldbergtatyana can we work with S2 cell ids instead? Since we use cell ids internally the 9*9 grid might not be consistent with the cells. E.g. 2 grids could map to the same cell if we just create the grid from the users location. If we use the center of the cell in which the user is located we could still end up with duplicated cells since the edge length of the cell is not guarantied to 250x250m.
If we just use s2 cells instead we won't have nice numbers, like 9x9 with 250 edge length. but we won't have overlapping cells (and possibly less duplicated predictions) We'd then return a prediction for every cell center and the cell length would be about 365m or 182m, depending on the S2 cell level we choose.
The s2 cells are out of the features (although we might need them internally). I'm sorry, I didn't have time to write it down this weekend. I will post the feature list in a couple of minutes. The advantage of making 81 or so predictions is that we can implement a probability threshold. So say there are 81 points. If we throw out all points that have a low probability of being predicted correctly, we are left with 5-10 points, which can be displayed nicely.
@bensLine what is the largesr resolution of c2 cells, i.e. what is the minimum length of the edges of a s2 cell? Inprinciple it doesnt matter if we use a defined 9*9 grid or s2 cells, important is that we provide predictions for locations at a reasonable distance from each other and cover an area of around 2km x 2km
@goldbergtatyana Here is a slide from an official S2 presentation with the details
So, I guess we just play around with the cell level and see what works best for the 2km area.
I am all for it, thanks @bensLine ! 👍
Sorry, I don't quite get it - for which time do we make predictions? I mean, which time do we use in queries? Are these supposed to be based on minutes? Then for next 15 mins timeframe and +-80 cells around a player we make 15*80=1200 queries, right?
The way I understood it is that a user on the website triggers a prediction query. Therefore we use the current map location on the website (center) and the current time. Out of that data, we create the necessary features (like time of the day (morning, evening, ...)) so that we have basically an entry of the data set, which only misses the class label, which we'll predict. we'll do this then for around 80 cells, as you say, but only for one timestamp. The results will then be displayed on the map until the user refreshes the site, triggers a new request or maybe the map team implements a timeout and requests after 5min new predictions...
I think for, say, 5 upcoming minutes there has to be done 5 queries - one for each minute, right?
Nah, if we stay with a 5 minutes interval, then we make a prediction just once, i.e. for the next five minutes. I am ok with increasing the interval to 10 or even 15 minutes. The choice is yours :)
Hey, @goldbergtatyana, die we already decide which 10k we use to create arff and how often we do this? I can't find the info. Thanx.
Hey @semioniy , I discussed that issue with @MatthiasBaur and we said that the 10K points will be taken randomly from the data points collected from the last 24 hours. We will generate a new training set every 15 minutes.
predictions are provided as array of the following objects:
{"pokemonId":"16","confidence":"0.242","latitude":11.6088567,"longitude":48.1679286}
We will get a user location as input for our classifier. We will then build a grid around the user - for example 5*5 with a edge length of 250m and make a prediction for each one of the points. So one user request gets 25 predictions. We have to come up with the format for the Map Team. The returning data should contain: 25 times (latitude, longitude, PredictedPokemonID, confidence). Points to clear up: