KWB-R / wasserportal

R Package with Functions for Scraping Data of Wasserportal Berlin (https://wasserportal.berlin.de)
https://kwb-r.github.io/wasserportal/
MIT License
0 stars 0 forks source link

How to interpret the timestamps returned by wasserportal.berlin.de? #5

Closed hsonne closed 3 years ago

hsonne commented 4 years ago

Timestamps returned for the day of switch CEST -> CET:

                Datum Einzelwert
1725 28.10.2018 01:00       -777
1726 28.10.2018 01:15       -777
1727 28.10.2018 01:30       -777
1728 28.10.2018 01:45       -777
1729 28.10.2018 02:00       -777
1730 28.10.2018 02:15       -777
1731 28.10.2018 02:30       -777
1732 28.10.2018 02:45       -777
1733 28.10.2018 03:00       -777
1734 28.10.2018 03:15       -777
1735 28.10.2018 03:30       -777
1736 28.10.2018 03:45       -777

The time shift back at 03:00 to 02:00 cannot be found in the data!

Timestamps returned for the day of switch CET -> CEST:

                Datum Einzelwert
8549 31.03.2019 01:00       -777
8550 31.03.2019 01:15       -777
8551 31.03.2019 01:30       -777
8552 31.03.2019 01:45       -777
8553 31.03.2019 03:00       -777
8554 31.03.2019 03:15       -777
8555 31.03.2019 03:30       -777
8556 31.03.2019 03:45       -777
8557 31.03.2019 03:00       -777
8558 31.03.2019 03:15       -777
8559 31.03.2019 03:30       -777
8560 31.03.2019 03:45       -777

The time shift forward at 02:00 to 03:00 can be found in the data, however the timestamps 03:xx appear twice!

hsonne commented 4 years ago

I just found this hint:

Alle Daten werden in europäischer Winterzeit publiziert. (https://www.berlin.de/senuvk/umwelt/wasser/ogewaesser/de/wasserportal.shtml)

With this information, the case CEST -> CET above is ok. However, the case CET -> CEST above is not ok. The timestamps "03:00", "03:15", "03:45", and "03:45" occur twice each. It seems that the system that prepares the data for download somewhere expects the timestamps to be given in time zone "Europe/Berlin". In this time zone, the timestamps "02:00", "02:15", "02:30", and "02:45" do not exist. I assume that these timestamps are just replaced with their existing counterparts, one hour later. We could probably correct the data by replacing the first occurrences of "03:xx" with "02:xx". Then, we could convert these timestamps to time objects by using the time zone "Etc/GMT-1", see https://kwb-r.github.io/kwb.datetime/dev/articles/timezones.html.

hsonne commented 3 years ago

The above mentioned correction is done within read_wasserportal_raw(). However, I think that the error has been fixed by the providers of "wasserportal" in the meantime. Would someone like to check this out?

https://github.com/KWB-R/wasserportal/blob/3ae9de4ffeaf82c0817edb1df674d45e5f8374bb/R/read_wasserportal_raw.R#L94