Closed jonas-he closed 8 years ago
This sounds promising. Thanks for pointing it out! We can use this to explore the learning algorithms before the data team provides more. Next step is to create a corresponding .arff file.
@MatthiasBaur : Yes, the weather features are up, but the API is to slow to collect enough data for every instance (~1 query per second). @sacdallago has a suggestion for you to speed up things: Make chunks of 250x250km (or bigger) of the planet → Take weather info every 6h (or more) of the corners of the square and the center and then either average the weather (rain, sun, etc.) of the five values or select the most frequent one.
The s2 library has the feature of telling you whats the corner and whats the center.
btw, @MatthiasBaur @Aurel-Roci @bensLine @marwage @semioniy can you come to the rostlab today (01.09.059) or could we all get on a skype call to go through the feature list and talk about the prediction strategy?
@goldbergtatyana I can skype today anytime before 5pm
@goldbergtatyana btw, here is a plot for the data dump (~600k entries) about the pokemon distribution and the percentage of pokemons in the data set with more than n entries.
thanks @bensLine for the very nice plots! I see that if we ignore pokemons with less than 20 sightings then we loose only 5% of them (i.e. 8 pokemons only). We should go for it.
Yesterday your group suggested to balance the data set for training. I think it is a great idea! Check out weka's SMOTE method http://stackoverflow.com/questions/22632932/how-to-set-parameters-in-weka-to-balance-data-with-smote-filter that is nicely designed for doing just that. Please use your subsample of 50K data points to see if SMOTE improves the performance. Let me know if you need help!
The Big Data set was used for selecting the best possible classifier. It is also going to be uploaded (enriched by our features) to Kaggle (See #35 ). I will reference a issue with exhaustive information about the classifiers shorty.
may be of interest: http://pokemongohub.net/data-mining-500-000-pokemon-spawns-encounters/ original reddit thread: https://www.reddit.com/r/pokemongodev/comments/51pfvh/large_pokemon_spawn_dump/