PokemonGoers / PokeData

In this project you will scrape as much data as you can get about the *actual* sightings of Pokemons. As it turns out, players all around the world started reporting sightings of Pokemons and are logging them into a central repository (i.e. a database). We want to get this data so we can train our machine learning models. You will of course need to come up with other data sources not only for sightings but also for other relevant details that can be used later on as features for our machine learning algorithm (see Project B). Additional features could be air temperature during the given timestamp of sighting, location close to water, buildings or parks. Consult with Pokemon Go expert if you have such around you and come up with as many features as possible that describe a place, time and name of a sighted Pokemon. Another feature that you will implement is a twitter listener: You will use the twitter streaming API (https://dev.twitter.com/streaming/public) to listen on a specific topic (for example, the #foundPokemon hashtag). When a new tweet with that hashtag is written, an event will be fired in your application checking the details of the tweet, e.g. location, user, time stamp. Additionally, you will try to parse formatted text from the tweets to construct a new “seen” record that consequently will be added to the database. Some of the attributes of the record will be the Pokemon's name, location and the time stamp. Additional data sources (here is one: https://pkmngowiki.com/wiki/Pok%C3%A9mon) will also need to be integrated to give us more information about Pokemons e.g. what they are, what’s their relationship, what they can transform into, which attacks they can perform etc.
Apache License 2.0
9 stars 6 forks source link

Statistics on aggregated data #121

Closed gyachdav closed 8 years ago

gyachdav commented 8 years ago

I played around with the data and started a google spreadsheet

https://docs.google.com/spreadsheets/d/1EFkw-_pS6G4rajmR85zqM1CBku2xxUQVoeECFU3cbWM/edit?usp=sharing

From the spread sheet you can see that there is a distinct correlation between CP and HP, but no pattern is found when trying to plot CP, weight and height.

It would be interesting to see some statistics for (not an exhaustive list more ideas encouraged):

For sighting info:

gyachdav commented 8 years ago

Assigning to @vivek-sethia so he can re-assign to appropriate team member.

samitsv commented 8 years ago
fabe85 commented 8 years ago

https://docs.google.com/spreadsheets/d/1n93UaFEdUqtC7_nF-oK1hAPO2BJx-413PXAHC1uY2bU/edit?usp=sharing google sheet for histogram of # of attacks and # of attacks vs maxCP

vivek-sethia commented 8 years ago

@fabe85 the plot says that x-axis represents name but those are listed with numbers. So what does it stand for? Does y-axis stand for the count ? If yes then, why are there only 10 numbers on the x-axis?

fabe85 commented 8 years ago

@vivek-sethia Oh sorry the axis labels are wrong. I will change that.

fabe85 commented 8 years ago

google sheet for top 25% of pokemon sighting, major sighting sources, distribution of types, resistance and weakness https://docs.google.com/spreadsheets/d/1mmy382caE5mCpKq021AhKawqvJCKB7UkBNfmSPJAsnQ/edit?usp=sharing

jonas-he commented 8 years ago

added diagrams from our presentation to the original spreadsheet from guy (click on "Sheet-2")

gyachdav commented 8 years ago

is this done?

fabe85 commented 8 years ago

@gyachdav Basically it is done except the 10 major urban and exotic centers with their respective daily average sightings. This creates some problems to me. I will ask some other team members to help me with that but I am not sure if we can solve that. However, I think the main statistics are provided and they give a clear overview.

fabe85 commented 8 years ago

Result: The remaining statistics cannot be generated only using the mlab databases. Therefore, I will close this issue now.