PokemonGoers / PokeData

In this project you will scrape as much data as you can get about the *actual* sightings of Pokemons. As it turns out, players all around the world started reporting sightings of Pokemons and are logging them into a central repository (i.e. a database). We want to get this data so we can train our machine learning models. You will of course need to come up with other data sources not only for sightings but also for other relevant details that can be used later on as features for our machine learning algorithm (see Project B). Additional features could be air temperature during the given timestamp of sighting, location close to water, buildings or parks. Consult with Pokemon Go expert if you have such around you and come up with as many features as possible that describe a place, time and name of a sighted Pokemon. Another feature that you will implement is a twitter listener: You will use the twitter streaming API (https://dev.twitter.com/streaming/public) to listen on a specific topic (for example, the #foundPokemon hashtag). When a new tweet with that hashtag is written, an event will be fired in your application checking the details of the tweet, e.g. location, user, time stamp. Additionally, you will try to parse formatted text from the tweets to construct a new “seen” record that consequently will be added to the database. Some of the attributes of the record will be the Pokemon's name, location and the time stamp. Additional data sources (here is one: https://pkmngowiki.com/wiki/Pok%C3%A9mon) will also need to be integrated to give us more information about Pokemons e.g. what they are, what’s their relationship, what they can transform into, which attacks they can perform etc.
Apache License 2.0
9 stars 6 forks source link

Additional data: Rarity of pokemon #163

Closed AlexanderLill closed 8 years ago

AlexanderLill commented 8 years ago

Hello everyone,

it would be very helpful to also get the data about the rarity of each pokemon. We want to use this for sorting pokemon in different places and also to show some indicator in our PokeDex.

The following fields are therefore useful: ranking: the number indicating which number this pokemon gets on a ranking from 1: rarest pokemon to 151: most often occurring pokemon. E.g. as far as I remember pidgey would be nr 151 here

appearance_likelihood: percentage of this pokemon of all pokemons, e.g. 20% of all appearing pokemon are pidgey

If there are any questions please let us know :) ( @johartl )

Best, Alex

jonas-he commented 8 years ago

@AlexanderLill There are already a lot of such rankings floating around the web, e.g. http://imgur.com/gallery/ZTgIu or https://docs.google.com/document/d/1iZeW3wt-h-L7v97FaDLSjqxFysTLQef92c1nNOqt3Tk but we could also use our own analysis data. I will check what makes more sense. Do you also want an API route for that or shall we provide it to you as an excel sheet/json or sth else?

johartl commented 8 years ago

It would be nice if you could add the data to the pokemon object itself.

{
  "pokemonId": 25,
  "name": "Pikachu",
  ...
  "ranking": 42,
  "appearanceLikelihood": 0.04
}
sacdallago commented 8 years ago

Mh. these data should come from the @PokemonGoers/predictpokemon-1 @PokemonGoers/predictpokemon-2 groups, though I wonder how. @goldbergtatyana can you get this kind of information out of the ML device itself? And how.. 💭

semioniy commented 8 years ago

@sacdallago we have the distribution of pokemon seen, it is only an approximation of rarity, but still. distrdistr2 But with 500k dataset it seems to be pretty legit to me. (The pictures are only from 50k dataset though)

sacdallago commented 8 years ago

yeah :) I get that, but eventually the numbers will adjust?

semioniy commented 8 years ago

I think so.

bensLine commented 8 years ago

@sacdallago from the 600k data dump we get those numbers for the different pokemon ids

I guess we could calculate representative percentages from that. However, I thought there should also be a pokemon grouping like rare, common, not-you-again. Am I mixing things up or was that the same?

jonas-he commented 8 years ago

done, see #171

AlexanderLill commented 8 years ago

@jonas-he thanks, great! :) @bensLine I think we will have to come up with a visualisation in the user interface, yes. I think we will somehow map the rarity ranks to a star-system (1 to 3 stars) or something, let's see. If someone is interested in brainstorming => https://github.com/PokemonGoers/Catch-em-all/issues/86

PS: It would be cool if the API documentation could be updated too so we have a quick overview over the available attributes