anebz / boulder

Occupancy tracker for bouldering gyms
http://boulder.anebz.eu
7 stars 3 forks source link

Algorithm exercise: fill missing values #19

Closed anebz closed 2 years ago

anebz commented 3 years ago

While I was making experiments, I missed some scrapings and data between some times is missing. When plotting the data for that day, it goes every 15mins until 15:30, and then it jumps to 17:00 because no data was gathered in between. I would like the values between those 2 times to be filled so that the x-axis always shows 15min intervals.

Simplified, given a list such as [2, 3, 4, 0, 2, 3, 0, 0, 1, 0], there are 4 0s, 4 missing values. I would like that that 0s be filled with the average value between the previous value and the next value. For the first 0, the previous value was 4, the next is 2, so the average is (4+2)/2=3. That 0 gets replaced with 3. For the next 0, there are 2 0s in succession. The value before the 0s was 3, the one after is 1, so both 0s are replaced by (3+1)/2=2. For the last 0, it's at the end of the list. So we could just average it with 0. That 0 would be replaced by (1+0)/2=0.5. After the algorithm, the list would look like this: [2, 3, 4, 3, 2, 3, 2, 2, 1, 0.5].

This in our situation, we would have entries every 15mins, and sometimes there would be no entry. This would constitute a 0 in the previous example, and this entry is the one that should be filled. Sometimes there is just one entry missing in succession, sometimes there are several. Maybe the missing entry is the first/last entry in the day, at 7:00 or at 23:00. I put all 3 scenarios in the previous example. The time that needs to be filled we already know, all the 15min intervals. The occupancy, waiting, temperature and weather status we should fill with the averages between the previous and after data. All are numbers and easy to average except the weather status 😉 I look forward to your idea of how to average categorical data.

Usually I write a step-by-step guide on how to solve things, this time I'm giving you a higher-level overview of the task 😄 feel free to ask me if you have questions

USE SCIPY INTERPOLATE 1D

AnjaTRPES commented 2 years ago

I'll take this one, if you're okay with it :)

AnjaTRPES commented 2 years ago

aaand I already have the first question: would it be possible to get some sample data (or ideally the whole dataset at some specific timepoint)?

anebz commented 2 years ago

aaand I already have the first question: would it be possible to get some sample data (or ideally the whole dataset at some specific timepoint)?

yeah I already texted you in Telegram :)

AnjaTRPES commented 2 years ago

pull request sent #27