PokemonGoers / PokeData

In this project you will scrape as much data as you can get about the *actual* sightings of Pokemons. As it turns out, players all around the world started reporting sightings of Pokemons and are logging them into a central repository (i.e. a database). We want to get this data so we can train our machine learning models. You will of course need to come up with other data sources not only for sightings but also for other relevant details that can be used later on as features for our machine learning algorithm (see Project B). Additional features could be air temperature during the given timestamp of sighting, location close to water, buildings or parks. Consult with Pokemon Go expert if you have such around you and come up with as many features as possible that describe a place, time and name of a sighted Pokemon. Another feature that you will implement is a twitter listener: You will use the twitter streaming API (https://dev.twitter.com/streaming/public) to listen on a specific topic (for example, the #foundPokemon hashtag). When a new tweet with that hashtag is written, an event will be fired in your application checking the details of the tweet, e.g. location, user, time stamp. Additionally, you will try to parse formatted text from the tweets to construct a new “seen” record that consequently will be added to the database. Some of the attributes of the record will be the Pokemon's name, location and the time stamp. Additional data sources (here is one: https://pkmngowiki.com/wiki/Pok%C3%A9mon) will also need to be integrated to give us more information about Pokemons e.g. what they are, what’s their relationship, what they can transform into, which attacks they can perform etc.
Apache License 2.0
9 stars 6 forks source link

Preferred way of importing? #140

Closed phdowling closed 8 years ago

phdowling commented 8 years ago

Hey guys,

I make use of the Twitter stream you open in your project. Currently I import it by cloning your repo to a certain location and doing an import of the file the stream is opened in:

var PokemonTwitter = require("../PokeData/app/controllers/filler/twitter");

However, for production, this of course needs to change - is your project on npm already? How should I expect to import the module I need? Also pinging @PokemonGoers/catch-em-all for this.

samitsv commented 8 years ago

we do not have a plan to ship a npm package, and if you want get the data generated by twitter, then you can do so using http://pokedata.c4e3f8c7.svc.dockerapp.io:65014/doc/#api-PokemonSighting-GetSightingBySource or http://pokedata.c4e3f8c7.svc.dockerapp.io:65014/api/pokemon/sighting/search?source=twitter

phdowling commented 8 years ago

The REST API won't do in this case, we need live tweets, i.e. the raw feed. I guess we can just start a seperate feed in our own module and be fairly independent from yours there. Another way to go would be for us to PR the code we wrote into your repo, then we could access all data sources freely - not sure what the best way to go is here. @gyachdav or @sacdallago , any suggestions here?

gyachdav commented 8 years ago

I recommend you stick with analyzing the Streaming API on your own, separated from project A.

sacdallago commented 8 years ago

@phdowling yup, I would suggest you create a npm package that the guys from A can use on the tweets to perform the sentiment analysis. They are listening to the tweets anyway, I imagine it to be something like adding a function (from your package) which calculates the score, deffer a write to a dedicated collection ({tweetId: xyz, sentiment: +1.2} and that's it.

Or, eventually, the guys from A can implement a npm runner to perform the score analysis for all the tweets.

@samitsv are you storing the RAW tweets somewhere? I can't remember

samitsv commented 8 years ago

@sacdallago raw tweets are not being stored

sacdallago commented 8 years ago

@samitsv MH. It might make sense that they are :) @gyachdav we did this last semester, but I'm not entirely sure it makes sense.

Taking the idea from https://github.com/PokemonGoers/HashPokemonGo/issues/12#issuecomment-246899760 maybe extend that object ({twitterId: xyz, sentiment: 1.23}) with:

and save that?

Also @samitsv , checking out the data ccoming from twitter: why store null lat/lng values? Aka:

{"_id":"57c936554e3bd9e1024717fc","source":"TWITTER","appearedOn":"2016-09-02T08:20:37.452Z","__v":0,"pokemonId":7,"location":null}

It makes sense in the collection mentioned above, but no sense in the sightings collection... just knowing that they spawned somewhere in the globe seems like the least informative feature ever to me :laughing: @goldbergtatyana do you agree?

samitsv commented 8 years ago

@sacdallago about null lat/lng, maybe important if someone wants to see the number of appearances of pokemons or want to know the appearance time of pokemon, like one pokemon appears mostly on day and not during night time?

sacdallago commented 8 years ago

hmm.. @goldbergtatyana @juanmirocks @gyachdav opinions?

goldbergtatyana commented 8 years ago

valid points from both of you. null values of long/lat

If we have an issue with storage space, then I would recommend to not store sightings with empty locations. If there is no issue, then yes store them :smile:

swathi-ssunder commented 8 years ago

@goldbergtatyana @sacdallago To add further to the discussion, we had to store null values for location even when there is no data for it(without skipping the location key altogether) since we have indexed data based on location data field for geospatial queries. So if we rather prefer not to have these entries, then we could skip/ignore the record altogether.

goldbergtatyana commented 8 years ago

thanks @swathi-ssunder ! Again, for predictions these records are useless. For statistics on sightings they are nice. However, for doing the statistics these data will be most likely used just once. Therefore, I would suggest to get rid of entries with empty location altogether.

samitsv commented 8 years ago

@phdowling @sacdallago i think i forgot about the part with twitter texts being stored. So do we store it or not? I see some sentiment api is already implemented, @phdowling could you let me know how I could access it?

sacdallago commented 8 years ago

@samitsv I would store it. Better to have this data than not.. @gyachdav @goldbergtatyana @juanmirocks opinions?

samitsv commented 8 years ago

being worked on https://github.com/PokemonGoers/PokeData/pull/162