PokemonGoers / PokeData

In this project you will scrape as much data as you can get about the *actual* sightings of Pokemons. As it turns out, players all around the world started reporting sightings of Pokemons and are logging them into a central repository (i.e. a database). We want to get this data so we can train our machine learning models. You will of course need to come up with other data sources not only for sightings but also for other relevant details that can be used later on as features for our machine learning algorithm (see Project B). Additional features could be air temperature during the given timestamp of sighting, location close to water, buildings or parks. Consult with Pokemon Go expert if you have such around you and come up with as many features as possible that describe a place, time and name of a sighted Pokemon. Another feature that you will implement is a twitter listener: You will use the twitter streaming API (https://dev.twitter.com/streaming/public) to listen on a specific topic (for example, the #foundPokemon hashtag). When a new tweet with that hashtag is written, an event will be fired in your application checking the details of the tweet, e.g. location, user, time stamp. Additionally, you will try to parse formatted text from the tweets to construct a new “seen” record that consequently will be added to the database. Some of the attributes of the record will be the Pokemon's name, location and the time stamp. Additional data sources (here is one: https://pkmngowiki.com/wiki/Pok%C3%A9mon) will also need to be integrated to give us more information about Pokemons e.g. what they are, what’s their relationship, what they can transform into, which attacks they can perform etc.
Apache License 2.0
9 stars 6 forks source link

Requirements for second phase #120

Closed swathi-ssunder closed 8 years ago

swathi-ssunder commented 8 years ago

@sacdallago, @gyachdav - We are working on the issues raised. Meanwhile, we would like to know if there are any other specific requirements/expectations for the deadline on 7th September.

sacdallago commented 8 years ago

You can always come up with new data sources, but I believe that there will be requests from other groups, adjustments, bugs and improvements that need to be handled on what you have till now.

I will talk to the other mentors in these days and see if there's anything more specific :) but you are doing an excellent job guys ;)

gyachdav commented 8 years ago

Excellent job indeed!

I would like to see massive and continuous read-in from of all sources. I saw today that PokeRadar has summed up over 10M data points. How many sightings are currently stored in our API? Of course the continuous scraping should only take place when the rostlab server is setup.

I will also like to see some statistics about the data you aggregated. This requirement is not in the original project description but having them would benefit our project immensely! Will specify some questions in a separate issue which i will reference here later.

gyachdav commented 8 years ago

121