PokemonGoers / PokeData

In this project you will scrape as much data as you can get about the *actual* sightings of Pokemons. As it turns out, players all around the world started reporting sightings of Pokemons and are logging them into a central repository (i.e. a database). We want to get this data so we can train our machine learning models. You will of course need to come up with other data sources not only for sightings but also for other relevant details that can be used later on as features for our machine learning algorithm (see Project B). Additional features could be air temperature during the given timestamp of sighting, location close to water, buildings or parks. Consult with Pokemon Go expert if you have such around you and come up with as many features as possible that describe a place, time and name of a sighted Pokemon. Another feature that you will implement is a twitter listener: You will use the twitter streaming API (https://dev.twitter.com/streaming/public) to listen on a specific topic (for example, the #foundPokemon hashtag). When a new tweet with that hashtag is written, an event will be fired in your application checking the details of the tweet, e.g. location, user, time stamp. Additionally, you will try to parse formatted text from the tweets to construct a new “seen” record that consequently will be added to the database. Some of the attributes of the record will be the Pokemon's name, location and the time stamp. Additional data sources (here is one: https://pkmngowiki.com/wiki/Pok%C3%A9mon) will also need to be integrated to give us more information about Pokemons e.g. what they are, what’s their relationship, what they can transform into, which attacks they can perform etc.
Apache License 2.0
9 stars 6 forks source link

Cache pokemon predictions #180

Closed AlexanderLill closed 7 years ago

AlexanderLill commented 7 years ago

Hello everyone,

sorry for bringing this up, but we are currently thinking about how the caching of pokemon predictions can be implemented. We are doing this, because currently the prediction team triggers a new round of predicting every time data is requested:

I'm from the predictions team and we would like to know if you cache our predictions when there is an equal request (location and time)? We are asking for that because predictions need processing power and therefore time. Have you already implemented something or are you planning to?

It would therefore make sense if the API would cache already predicted pokemon for a certain location and time.

We started a discussion here (https://github.com/PokemonGoers/Catch-em-all/issues/93) with @marwage and @bensLine and we came to the result that it would make a lot of sense, if the @PokemonGoers/pokedata team could cache those predictions.

@sacdallago mentioned some possibilities here https://github.com/PokemonGoers/Catch-em-all/issues/93#issuecomment-251894037.

If nobody complains I will close the issue https://github.com/PokemonGoers/Catch-em-all/issues/93 in the @PokemonGoers/catch-em-all group.

jonas-he commented 7 years ago

@sacdallago referring to your comment from the closed issue: i don't know if express does some magic caching on its own but if thats not the case then we don't do caching. What do you think of https://www.npmjs.com/package/apicache ? Shouldnt be too much work to integrate it right?

johartl commented 7 years ago

I assume this will only work for exactly the same requests? If we have two requests coming from two users their latitude/longitude parameters will most likely differ even if the two users are in fact very close. So if we decide to use this we will probably need to truncate the accuracy of the latitude/longitude parameters by a few decimals. Or we use some concept of cells und retrieve the predictions by cell id?

jonas-he commented 7 years ago

@johartl yes that is true. Currently im not having this much time to implement a super fancy caching thing. If anyone of my project or the other people want to do this then go for it. I would also suggest that once we get to a high number of users an we run into issues one can start optimizing performance (an thus implement an efficient and smart cache).

sacdallago commented 7 years ago

@jonas-he so, for now, let's got for the not-so-fancy caching variant.

It would definitely make sense to see how much this is actually needed, by looking at how much it takes to generate 10 predictions at the same time..

jonas-he commented 7 years ago

see #181