Closed samitsv closed 8 years ago
@juanmirocks
I would collect first a relatively reliable sample of 100 to 1000 tweets containing any of your suggested hashtags or keywords and also containing geolocation data. Based on that, you can manually study how those tweets are written and therefore design a better algorithm for the extraction of information. For example, it may well be that many of those tweets do not contain keywords such as 'caught' and just write down the pokemon's name.
I would also advice on storing the tweets' images, if any. I manually checked on a small sample, that often most of the information is indeed contained in the image.
@juanmirocks thank you for your feedback. The above strategy was designed after having looked into many tweets related to catching or sighting of pokemons. And so these keywords deliver the notion related to pokemon sightings in pokemongo. Ofcourse some other synonym keywords could be added, but I don't think just a tweet with pokemon's name can be relevant in pokemon sighting. But rather "I saw pikachu or caught pikachu or was attacked by pikachu" are relevant and is covered by the keywords above. And also regarding the images. Unless we plan to add image processing in here, the image can't be 100% sure to contain details without the tweet being relavant because someone can post images completely unrelated to the tweet.
I'm playing around with your twitter stream at the moment, and I would actually suggest tracking all pokemon names first, an then checking if the tweet was pokemon go related - I think currently you are missing a lot of relevant tweets.
I'm locally changing some stuff in the twitter module since we are dependent on a twitter stream as well, so I will probably just open a PR on this.
Here's an example of just a few seconds of tweets with the new filter:
got tweet: A wild Exeggcute appeared! It will be near OFFICE area until 5:56 PM. https://t.co/ssDVp1z4yf #Exeggcute #OFFICE #PokemonGo #NMK
got tweet: A wild Squirtle has appeared! Available until 06:58:01 (13m 54s). https://t.co/qFhgZU9uYy
got tweet: Gastly: A wild Gastly has appeared! Available until 04:58:25 (14m 15s). https://t.co/fm7f8VUmyV
got tweet: Dropped down a lure for Pokemon GO at the #PGCHelsinki venue. Need to catch me a Jigglypuff...
got tweet: A wild Tangela appeared! It will be near Sangenjaya Station until 6:54 AM. https://t.co/a4uWRe9gsQ #PokemonGo
got tweet: A wild Squirtle appeared! It will be near Blaine Hill BBQ until 6:52 AM. https://t.co/n0nFIDtWxm
got tweet: Let's have some fun? ! I'm there- https://t.co/VScPyKKJwG https://t.co/WPkudNuMh5
got tweet: Omanyte has appeared near 3864 Wilson Ave, 48906! Available until 06:59:13 (15m 0s). https://t.co/ItUm9hn24x
got tweet: This is the worst thing imaginable for #PokemonGO Players out there. https://t.co/0Y6hi7gxIm
got tweet: A wild Pikachu has appeared! Available until 12:54:03 (9m 45s). https://t.co/tlXg37ZEae
As you can see, a lot of bots already tweet pokemon sightings, but they only geo-tag them in encoded urls. Maybe we can still find a good way to leverage this though
Seems like you're getting the Lon/Lat on the gmap. Should be straight forward to use gmap API to pull out those coordinates.
@gyachdav not so straightforward but doable! If you are at 31 something ave, you might be listed to be at 37 somthing ave, but good enough. It's called reverse geo-coding, someone implemented it yesterday somewhere... BAM: https://github.com/PokemonGoers/PredictPokemon-2/pull/19
Oh, and if on the other hand the problem is encoding an address in lat/lng, then it's geo-coding:
https://maps.googleapis.com/maps/api/geocode/json?address={query}
:D man, working for a company without a cent to spend on fancy APIs to query locations does pay out at some stage :D :D
@sacdallago @gyachdav @phdowling I have implemented in this way here https://github.com/PokemonGoers/PokeData/pull/127