PokemonGoers / PredictPokemon-2

In this project we will apply machine learning to establish the TLN (Time, Location and Name - that is where pokemons will appear, at what date and time, and which Pokemon will it be) prediction in Pokemon Go.
Apache License 2.0
9 stars 3 forks source link

Weather features #43

Closed MatthiasBaur closed 8 years ago

MatthiasBaur commented 8 years ago

We need to circumvene the API restrictions. To do this the plan is to save coarse information. e.g. 100km² or even coarser and map the information to the smaller cells. The same goes for sunrise/sunset.

goldbergtatyana commented 8 years ago

We need the weather features as soon as possible, as we need them for predictions that we are working now on :)

As to the project in general, we planned the deadline is supposed to be end of the week after the presentation week.

semioniy commented 8 years ago

Hey, @goldbergtatyana, @gyachdav, I just discovered that 95% of corrupt responds from API are because of missing humidity. Maybe I should skip humidity as a feature? Because of it we lose about 25-30% of data.

semioniy commented 8 years ago

BTW, I tried to optimize the weather feature as I could, but it remains too slow even after I added caching by S2 cell (about 10*10km) and a timeframe (2 hours). Yeah, it defenitely got faster, like, 3-5 times, but still we'll have to split our dataset in peaces, run proccessing on several computers, and probably every computer will need from 5 to 20, or even more API keys. Or, as an alternative to that many API keys, we'll have to pay 50$ for the dataset of 500k entries (it costs 0,01$ for 100 requests)....

semioniy commented 8 years ago

About performance, 500 entries took 100 seconds. So 500.000 on one computer would take about 28 hours, if we exclude any errors that require restart of the proccess (though from the last saved state, not from beginning). Also, I still don't know what happens when they block my API key for the rest of the day. For some reason, they didn't yet, though I exceeded free API requests multiple times already. I have an error handler in code that switches API keys from array of keys if respond is... unusual, but I don't know how it'll work in a real situation.

P.S. it takes 1925 seconds for 2400 entries. No idea, why the speed differs so much, but on the average it's 0.8 sec/request, and it's a problem.

P.P.S. with a larger S2 level (8 instead of 10, which has radius of 48km instead of 10) and a bigger timeframe (4 hours instead of 2, it means 1 day is 6 timeframes instead of 12) I got a time of 570 seconds for 2400 entries.

semioniy commented 8 years ago

And, probably, if we still plan to use this API, I'll need everyone here to register for it with his/her email, so that I have about 50 keys in the end. I don't have so many email addresses.

P.S. made 18712 requests ever night - they stil didn't block me...

semioniy commented 8 years ago

@sacdallago, @juanmirocks your opinions are welcome as well)

semioniy commented 8 years ago

Well, for now it works. 50k entries already have weather feature. For now it gathers weather for S2 cells of 23*23km size and timeframes of 3 hours.