FriskByBergen / friskby

3 stars 9 forks source link

Midnight data received from BG sensors are interpreted +24h wrong #162

Closed pgdr closed 7 years ago

pgdr commented 7 years ago

When we are given data from BG sensors at midnight (say 00:00 the midnight between day 1 and day 2), we interpret the data to belong to next midnight (end of day 2).

pgdr commented 7 years ago

I'm quite sure, if air_in_bergen.py is still in use, that the offending line is:

current_date = date.today().isoformat()
# ...
ts = current_date + "T" + df[i][tid] + ":00Z"

See air_in_bergen.py#L51.

In any case, in the database, the following entry can be found:

BG_1_PM25";"2017-02-25 00:01:39.195074+01";"2017-02-26 00:00:00+01";14.6;f

The first time stamp is time_recieved (sic) and the latter is the timestamp_data.

If air_in_bergen.py is still in use, we should use datetime.strftime correctly.

oyta commented 7 years ago

Looks good. It would be nice to get correct timestamps on these sensors :)

pgdr commented 7 years ago

180 is a simple fix for now

pgdr commented 7 years ago

I think I understand what's wrong now. It's actually in the line

current_date = date.today().isoformat()

that completely ignores timezones, which once a day gives the wrong date compared to the date displayed at luftkvalitet.info.

pgdr commented 7 years ago

Okay, this will still be wrong; we need to change the URIs. If you look at one sensor, you see that we get a full timestamp, that is, date+time.

If we parse the table with several sensors, we only get the clock, and not the date to which the clock belongs.

That means that if we read the website 2017-03-13T00:01:00 and the website hasn't updated yet, we will read clock 23:00 and a value, and interpret that as 2017-03-13T23:00:00. If we received the full timestamp, then we would get 2017-03-12T23:00:00

Can we change the scraping to a site which exposes the full timestamp?

oyta commented 7 years ago

Hehe, this wasn't easy. I haven't found any other sites that we can use. Then best option would be to get access to their API if they have one.

@njberland You have some contacts in NILU - can you ask them how we can access their data in a better way?

oyta commented 7 years ago

I changed the cron-job (or whatever) in the Friskby admin to run 10-past each hour instead of o-clock. It is not the solution(!), but it may solve the issue is many cases.

pgdr commented 7 years ago

I changed the cron-job (or whatever) in the Friskby admin to run 10-past each hour instead of o-clock. It is not the solution(!), but it may solve the issue is many cases.

Very good! I thought about something like that (but actually around 15 or even 30 minutes past the hour) yesterday, but couldn't figure out how or where to do it. I think 10 past works; we can even verify that in our rawdata table; (e.g.) timestamp received: 09:11:40 and timestamp data: 09:00:00.

Compare to prior to cron update: (e.g.) timestamp received: 09:01:50 and timestamp data: 08:00:00.

@njberland requested changes to luftkvalitet.info: it's enough if they simply add the date stamp somewhere (preferably in the table) on their webpage. As it stands today, it is impossible to know the date of the previous measurement on Danmarksplass.

pgdr commented 7 years ago

Seems fixed now, thanks @oyta