PokemonGoers / PokeData

In this project you will scrape as much data as you can get about the *actual* sightings of Pokemons. As it turns out, players all around the world started reporting sightings of Pokemons and are logging them into a central repository (i.e. a database). We want to get this data so we can train our machine learning models. You will of course need to come up with other data sources not only for sightings but also for other relevant details that can be used later on as features for our machine learning algorithm (see Project B). Additional features could be air temperature during the given timestamp of sighting, location close to water, buildings or parks. Consult with Pokemon Go expert if you have such around you and come up with as many features as possible that describe a place, time and name of a sighted Pokemon. Another feature that you will implement is a twitter listener: You will use the twitter streaming API (https://dev.twitter.com/streaming/public) to listen on a specific topic (for example, the #foundPokemon hashtag). When a new tweet with that hashtag is written, an event will be fired in your application checking the details of the tweet, e.g. location, user, time stamp. Additionally, you will try to parse formatted text from the tweets to construct a new “seen” record that consequently will be added to the database. Some of the attributes of the record will be the Pokemon's name, location and the time stamp. Additional data sources (here is one: https://pkmngowiki.com/wiki/Pok%C3%A9mon) will also need to be integrated to give us more information about Pokemons e.g. what they are, what’s their relationship, what they can transform into, which attacks they can perform etc.
Apache License 2.0
9 stars 6 forks source link

Good news everyone #145

Closed sacdallago closed 8 years ago

sacdallago commented 8 years ago

Good news everyone

image

The log with pokemap is:

pokemap-1 | 2016-09-14T05:55:45.709323913Z 
pokemap-1 | 2016-09-14T05:55:45.709358013Z > pokemongo-api@0.0.1 listen /usr/src/app
pokemap-1 | 2016-09-14T05:55:45.709366646Z > NODE_ENV=production node scripts/listen.js
pokemap-1 | 2016-09-14T05:55:45.709372421Z 
pokemap-1 | 2016-09-14T05:55:52.727090677Z /usr/src/app/node_modules/mongoose/node_modules/mongodb/lib/utils.js:98
pokemap-1 | 2016-09-14T05:55:52.727162646Z     process.nextTick(function() { throw err; });
pokemap-1 | 2016-09-14T05:55:52.727172372Z                                   ^
pokemap-1 | 2016-09-14T05:55:52.727178763Z 
pokemap-1 | 2016-09-14T05:55:52.727184434Z TypeError: this is not a typed array.
pokemap-1 | 2016-09-14T05:55:52.727190184Z     at Function.from (native)
pokemap-1 | 2016-09-14T05:55:52.727195830Z     at encode (/usr/src/app/app/services/mapService.js:16:19)
pokemap-1 | 2016-09-14T05:55:52.727258079Z     at baseLink (/usr/src/app/app/services/mapService.js:81:38)
pokemap-1 | 2016-09-14T05:55:52.727267178Z     at searcher (/usr/src/app/app/services/mapService.js:117:27)
pokemap-1 | 2016-09-14T05:55:52.727273162Z     at /usr/src/app/app/services/mapService.js:227:33
pokemap-1 | 2016-09-14T05:55:52.727278839Z     at /usr/src/app/node_modules/async/dist/async.js:3671:13
pokemap-1 | 2016-09-14T05:55:52.727284689Z     at replenish (/usr/src/app/node_modules/async/dist/async.js:884:21)
pokemap-1 | 2016-09-14T05:55:52.727520579Z     at /usr/src/app/node_modules/async/dist/async.js:888:13
pokemap-1 | 2016-09-14T05:55:52.727528036Z     at eachOfLimit (/usr/src/app/node_modules/async/dist/async.js:915:26)
pokemap-1 | 2016-09-14T05:55:52.727534043Z     at /usr/src/app/node_modules/async/dist/async.js:920:20
pokemap-1 | 2016-09-14T05:55:52.727539847Z     at _parallel (/usr/src/app/node_modules/async/dist/async.js:3670:9)
pokemap-1 | 2016-09-14T05:55:52.727588521Z     at Object.series (/usr/src/app/node_modules/async/dist/async.js:4496:7)
pokemap-1 | 2016-09-14T05:55:52.727601458Z     at Object.module.exports.search (/usr/src/app/app/services/mapService.js:283:15)
pokemap-1 | 2016-09-14T05:55:52.727608365Z     at Object.module.exports.insertToDb (/usr/src/app/app/controllers/filler/mapService.js:10:17)
pokemap-1 | 2016-09-14T05:55:52.727629964Z     at NativeConnection.<anonymous> (/usr/src/app/scripts/listen.js:32:18)
pokemap-1 | 2016-09-14T05:55:52.727639112Z     at emitNone (events.js:67:13)
pokemap-1 | 2016-09-14T05:55:52.746655828Z 
pokemap-1 | 2016-09-14T05:55:52.752137372Z npm info pokemongo-api@0.0.1 Failed to exec listen script
pokemap-1 | 2016-09-14T05:55:52.752422939Z npm ERR! Linux 3.13.0-85-generic
pokemap-1 | 2016-09-14T05:55:52.752760937Z npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "listen" "--collection=pokemap"
pokemap-1 | 2016-09-14T05:55:52.753344020Z npm ERR! node v4.0.0
pokemap-1 | 2016-09-14T05:55:52.753641354Z npm ERR! npm  v2.14.2
pokemap-1 | 2016-09-14T05:55:52.754082576Z npm ERR! code ELIFECYCLE
pokemap-1 | 2016-09-14T05:55:52.754317265Z npm ERR! pokemongo-api@0.0.1 listen: `NODE_ENV=production node scripts/listen.js`
pokemap-1 | 2016-09-14T05:55:52.754537603Z npm ERR! Exit status 1
pokemap-1 | 2016-09-14T05:55:52.754842607Z npm ERR! 
pokemap-1 | 2016-09-14T05:55:52.755057355Z npm ERR! Failed at the pokemongo-api@0.0.1 listen script 'NODE_ENV=production node scripts/listen.js'.
pokemap-1 | 2016-09-14T05:55:52.755279482Z npm ERR! This is most likely a problem with the pokemongo-api package,
pokemap-1 | 2016-09-14T05:55:52.755527862Z npm ERR! not with npm itself.
pokemap-1 | 2016-09-14T05:55:52.755979261Z npm ERR! Tell the author that this fails on your system:
pokemap-1 | 2016-09-14T05:55:52.756221235Z npm ERR!     NODE_ENV=production node scripts/listen.js
pokemap-1 | 2016-09-14T05:55:52.756492669Z npm ERR! You can get their info via:
pokemap-1 | 2016-09-14T05:55:52.756781737Z npm ERR!     npm owner ls pokemongo-api
pokemap-1 | 2016-09-14T05:55:52.757059957Z npm ERR! There is likely additional logging output above.
pokemap-1 | 2016-09-14T05:55:52.760379812Z 
pokemap-1 | 2016-09-14T05:55:52.760955332Z npm ERR! Please include the following file with any support request:
pokemap-1 | 2016-09-14T05:55:52.766135857Z npm ERR!     /usr/src/app/npm-debug.log

Could someone look into that?

Still everything is running on my server. But at least the endpoint will not change now and we are running the listeners for some time.

Please also let me know if you actually see the new data in the database, logging on production is not very exhaustive.

jonas-he commented 8 years ago

@sacdallago the cause seems to be Buffer.from() method. It was introduced in node js 6.0.0 and were running 4.0.0. So either i Change it to new Buffer() or we update node js. Yes there is new data. About 1.1 million sightings, around 350 MB ... so hitting the limit soon :smile:

jonas-he commented 8 years ago

570 MB as of now ... 500 MB was the limit right?

sacdallago commented 8 years ago

oh shit :D @gyachdav @goldbergtatyana @juanmirocks can someone with more authority than I ping Tim on the mongo issue?

juanmirocks commented 8 years ago

@sacdallago I will ask

juanmirocks commented 8 years ago

What size limit do we expect for the mongodb?

sacdallago commented 8 years ago

thanks @juanmirocks , also @goldbergtatyana will participate in the quest :)

I asked for a 500GB instance, but considering the growth of this db and the size of the old one, I would almost be tempted to go for a 1TB, if Tim has space left somewhere!

@PokemonGoers/pokedata please fix #143 and #146 ASAP

samitsv commented 8 years ago

@sacdallago it seems to me somehow the new twitter pokemon sightings data is not added to the pokemonsightings collection, is the twitter credentials added somewhere? config file or so? compared to other sources, twitter requires the credentails to be added as well

MLAB_USERNAME=<MLAB_USERNAME> MLAB_PASSWORD=<MLAB_PASSWORD> MLAB_URI=<MLAB_URI> MLAB_COLLECTION=<MLAB_COLLECTION> CONSUMER_KEY=<CONSUMER_KEY> CONSUMER_SECRET=<CONSUMER_SECRET> ACCESS_TOKEN=<ACCESS_TOKEN> ACCESS_TOKEN_SECRET=<ACCESS_TOKEN_SECRET> NODE_ENV=<NODE_ENV> npm run listen -collection=twitter
sacdallago commented 8 years ago

@samitsv twitter listener is indeed running and is indeed running with the env variables. This needs some further digging. Although: we maxed out the mlba space, right? I got the mongo instance from the lab today but there is another issue to solve first, otherwise it won't work.. might need some time still, which is bad.

sacdallago commented 8 years ago

Running manually on RostLab now:

$ docker ps
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS               NAMES
ea097ae5da09        pokemongoers/pokedata   "npm run listen -coll"   18 seconds ago      Up 17 seconds       8080/tcp            determined_payne
f093ddaf3d93        pokemongoers/pokedata   "npm run listen -coll"   19 seconds ago      Up 18 seconds       8080/tcp            happy_brown
1a59e093e4fc        pokemongoers/pokedata   "npm run listen -coll"   20 seconds ago      Up 18 seconds       8080/tcp            zen_easley
25f585ba9e6b        pokemongoers/pokedata   "npm run listen -coll"   21 seconds ago      Up 19 seconds       8080/tcp            cocky_shannon
3f8ab8191c86        pokemongoers/pokedata   "npm run listen -coll"   21 seconds ago      Up 20 seconds       8080/tcp            gigantic_jang
fe7a5eb4d7b0        pokemongoers/pokedata   "npm run listen -coll"   22 seconds ago      Up 21 seconds       8080/tcp            big_chandrasekhar
663b7c4ecd54        pokemongoers/pokedata   "npm run listen -coll"   23 seconds ago      Up 21 seconds       8080/tcp            agitated_mcclintock
5e2dd9f8c2c4        pokemongoers/pokedata   "npm run listen -coll"   24 seconds ago      Up 22 seconds       8080/tcp            serene_curie
f64f040c5430        pokemongoers/pokedata   "npm run listen -coll"   24 seconds ago      Up 23 seconds       8080/tcp            drunk_leakey

the database is populating fast (after 5 minutes)

> show dbs
local      0.078GB
pokemongo  0.203GB

1 second apart calls of count():

> db.pokemonsightings.count()
43635
> db.pokemonsightings.count()
43726
> db.pokemonsightings.count()
43858
> db.pokemonsightings.count()
43859
> db.pokemonsightings.count()
43909

If nothing goes wrong, we will have some nice data for the kaggle @gyachdav @goldbergtatyana Leaving for a 32h-awake trip now 😪 🌴 have fun guys!

jonas-he commented 8 years ago

@sacdallago hows the DB doing 😄 ?

sacdallago commented 8 years ago

It took me about 10 minutes to type this via ssh 😷 Bali is good for surfing but not on the web!

> show dbs
local      0.078GB
pokemongo  5.951GB
> db.pokemonsightings.count()
6858914

not bad for three days, guys!

@goldbergtatyana @gyachdav this data should make it to the kaggle. I started a data dump now on the external drive connected to the old got virtual machine, from where somehow I should be able to copy it to somewhere else in rostlab or expose it on the web (or you can ask Tim if he unplugs the hard drive from the VM and you copy the data the old fashioned way).