PokemonGoers / PokeData

In this project you will scrape as much data as you can get about the *actual* sightings of Pokemons. As it turns out, players all around the world started reporting sightings of Pokemons and are logging them into a central repository (i.e. a database). We want to get this data so we can train our machine learning models. You will of course need to come up with other data sources not only for sightings but also for other relevant details that can be used later on as features for our machine learning algorithm (see Project B). Additional features could be air temperature during the given timestamp of sighting, location close to water, buildings or parks. Consult with Pokemon Go expert if you have such around you and come up with as many features as possible that describe a place, time and name of a sighted Pokemon. Another feature that you will implement is a twitter listener: You will use the twitter streaming API (https://dev.twitter.com/streaming/public) to listen on a specific topic (for example, the #foundPokemon hashtag). When a new tweet with that hashtag is written, an event will be fired in your application checking the details of the tweet, e.g. location, user, time stamp. Additionally, you will try to parse formatted text from the tweets to construct a new “seen” record that consequently will be added to the database. Some of the attributes of the record will be the Pokemon's name, location and the time stamp. Additional data sources (here is one: https://pkmngowiki.com/wiki/Pok%C3%A9mon) will also need to be integrated to give us more information about Pokemons e.g. what they are, what’s their relationship, what they can transform into, which attacks they can perform etc.
Apache License 2.0
9 stars 6 forks source link

Fastpokemap extraction #133

Closed jonas-he closed 8 years ago

jonas-he commented 8 years ago

run with "npm run listen --collection=fastpokemap"

swathi-ssunder commented 8 years ago

@jonas-he, I pulled and executed the code.

The initial logs showed

INFO searcher active! latitude: 20 to 20.5, longitude: -159.5 to -159 
INFO Using proxy number 0: 0.0.0.0 
INFO Finished!

Data did not get inserted in the db..probably because of no data at these coordinates.. I let it continue..

Then there was this error

INFO searcher active! latitude: 30 to 35, longitude: -120 to -115 
INFO Using proxy number 0: 0.0.0.0 
INFO 
 length 2
INFO 2 Pokemon in this box! 
INFO 
 length 1
INFO 1 Pokemon in this box! 
ERROR SyntaxError: Unexpected token } in JSON at position 1223
    at Object.parse (native)
    at IncomingMessage.<anonymous> (/Users/swathissunder/workspace/rostlab/PokeData/app/services/mapService.js:141:41)
    at emitNone (events.js:91:20)
    at IncomingMessage.emit (events.js:185:7)
    at endReadableNT (_stream_readable.js:926:12)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9) 
INFO Finished!

After continuing further, there were a series of errors..

ERROR SyntaxError: Unexpected token < in JSON at position 0
    at Object.parse (native)
    at IncomingMessage.<anonymous> (/Users/swathissunder/workspace/rostlab/PokeData/app/services/mapService.js:137:45)
    at emitNone (events.js:91:20)
    at IncomingMessage.emit (events.js:185:7)
    at endReadableNT (_stream_readable.js:926:12)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9) 

And finally

ERROR Timeout/Connection Reset occured! 
INFO Trying again!
jonas-he commented 8 years ago

@swathi-ssunder Timeout/Connection Reset is normal behaviour, because server blocks your IP after some amount of request, then after some time you get unblocked and it starts working again...

I forgot to remove a logger.error call for debugging purposes so these errors are not important because if they dont return valid JSON theres nothing i can do about it but ignore the response.

However i am wondering how you got the message "latitude: 30 to 35, longitude: -120 to -115 " because for fastpokemap i use a scan size of 0.5 x 0.5 ... did you really run with --collection=fastpokemap?

swathi-ssunder commented 8 years ago

@jonas-he - Yes, I understand the logic behind the timeout error. And I just executed npm run listen --collection=fastpokemap. And inspite of these errors and letting it run for about 10-15 minutes, I still had no data inserted to the db.

jonas-he commented 8 years ago

@swathi-ssunder i ran it about 1 hour now and got around 5000 new pokemon, so yes it is a bit slow but i cant do anything against it since server is limiting requests. Will optimize scan areas in the future to make it faster.

gyachdav commented 8 years ago

Guys I don't have access to your sightings db. Can you report the number of sightings you already collected? Preferably it'd be great to gave a breakdown of sighting per source.

On Sep 7, 2016, at 9:37 PM, Swathi S Sunder notifications@github.com wrote:

Merged #133.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

jonas-he commented 8 years ago

@gyachdav On my local MongoDB instance there are about 435k sightings overall: PokeRadar: 180k Skiplagged: 20k Pokecrew: 225k fastpokemap: 12k The data was generated when i tested my scripts, so the time is not very uniformly distributed. On the shared instance over at mlab there are about 2.5k sightings.