PokemonGoers / PokeData

In this project you will scrape as much data as you can get about the *actual* sightings of Pokemons. As it turns out, players all around the world started reporting sightings of Pokemons and are logging them into a central repository (i.e. a database). We want to get this data so we can train our machine learning models. You will of course need to come up with other data sources not only for sightings but also for other relevant details that can be used later on as features for our machine learning algorithm (see Project B). Additional features could be air temperature during the given timestamp of sighting, location close to water, buildings or parks. Consult with Pokemon Go expert if you have such around you and come up with as many features as possible that describe a place, time and name of a sighted Pokemon. Another feature that you will implement is a twitter listener: You will use the twitter streaming API (https://dev.twitter.com/streaming/public) to listen on a specific topic (for example, the #foundPokemon hashtag). When a new tweet with that hashtag is written, an event will be fired in your application checking the details of the tweet, e.g. location, user, time stamp. Additionally, you will try to parse formatted text from the tweets to construct a new “seen” record that consequently will be added to the database. Some of the attributes of the record will be the Pokemon's name, location and the time stamp. Additional data sources (here is one: https://pkmngowiki.com/wiki/Pok%C3%A9mon) will also need to be integrated to give us more information about Pokemons e.g. what they are, what’s their relationship, what they can transform into, which attacks they can perform etc.
Apache License 2.0
9 stars 6 forks source link

Limiting the number of pokemons in the response #164

Closed gqinami closed 8 years ago

gqinami commented 8 years ago

Hi @PokemonGoers/pokedata,

after the meeting today with @PokemonGoers/catch-em-all and @PokemonGoers/pokemap-2 we decided that we would like to get in the response no more than 1000 pokemons (sightings or predictions) because a higher number is not really useful for the user and is not visible to display in the map.

Also, when cutting the response, we would like to have also an indication that you already cut the data (like a flag or something in the response).

If you have any questions, please let us know.

Thanks, PokeMap1 team

jonas-he commented 8 years ago

@gqinami We already implemented a limit of 2500 sightings per API request, however that can easily be changed to 1000. @swathi-ssunder @vivek-sethia for the "flag" wether or not the data has been limited or not mongoDBs limit function does not give us this information. This means if it returns 1000 we would not know if the actual request did contain exactly 1000 by chance or if it was cut down. To mitigate that we could limit by 1001 and then if we receive 1001 we will remove the last one and set the "limit flag" to true, otherwise it will be false. Do you think this is the right approach?

MajorBreakfast commented 8 years ago

Here's more from our discussion yesterday:

Since the app always shows a time range of sightings, it is best to cut off by time. You should keep the 1000 Pokemon closer to now and discard those farther in the past. In other words: Sort by time (more recent first) and cut off at 1000. (Or at 1001 for your trick :)

The idea is that if the user filters for Pokemon that are very rare, she can see them all. Should the user filter for all the Pokemon and also have the map zoomed out very far, she might not see them all, but she gets a warning and the shown Pokemon are more or less evenly distributed on the map.

MajorBreakfast (Project E)

swathi-ssunder commented 8 years ago

@jonas-he - Right. If we want to know the total number of records, then we will either have to use a separate count query or maybe use an aggregate query with the limit. Rather, limiting by 1001 and making the inference seems to be better.

jonas-he commented 8 years ago

@MajorBreakfast yes, we already sort by time first (descending) and then limit :)

jonas-he commented 8 years ago

done, see #166