fluves / pywaterinfo

Python package to download time series data from waterinfo.be
https://fluves.github.io/pywaterinfo/
MIT License
17 stars 9 forks source link

Wrong date retrieved for daily flow #54

Closed olivierbonte closed 2 months ago

olivierbonte commented 1 year ago

Description

When using pywaterinfo to get daily flow averages from station L06_342, the values are assigned to the wrong date: values belonging to day x, are assigned to day x-1 at 23:00. An example of this is given below

What I Did

from pywaterinfo import Waterinfo
import datetime
vmm = Waterinfo('vmm')
station = 'Zwalm'
station_id = 'L06_342'
stationsinfo_OG = vmm.get_timeseries_list(station_no = station_id)
stationsinfo =  stationsinfo_OG[stationsinfo_OG['ts_name'] == 'DagGem']
stationsinfo =  stationsinfo[stationsinfo['ts_unitsymbol'] == 'm³/s']
dateparse_waterinfo = lambda x: datetime.datetime.strptime(x, "%Y-%m-%d %H:%M:%S")
t_start = dateparse_waterinfo("2012-01-01 00:00:00")
t_end = dateparse_waterinfo("2012-01-06 23:00:00")
flowdf = vmm.get_timeseries_values(
                ts_id = str(int(stationsinfo['ts_id'].values)),#type:ignore
                start = t_start,
                end = t_end
)
flowdf

This gives the following dataframe (flowdf) as a result:

image

Here the peak of 7 m^3/s is assigned to 2012-01-04 23:00. When the data is directly downloaded from waterinfo.be as csv however, the value of 7 m^3/s is assigned to 2012-01-05 (cf. picture below).

image

binomaiheu commented 1 year ago

Hi Olivier,

Thanks for reporting this. Without much in depth investigation, I suspect the issue is related to the handling of the timezones in the KIWIS API . As you are requesting daily values, the fact that KIWIS returns a timestamp of 23:00 probably means the API thinks you're requesting the value in UTC. Internally in the waterinfo back-end there is a convention to use GMT+1 (Belgian winter time) throughout the year. If you try querying explicitly with :

vmm.request_kiwis({"request": "getTimeseriesValues", "ts_id": 68033042, "from": "2012-01-01", "to": "2012-01-06", "timezone": "GMT+1"})

then it yields the correct timestamp (i think you can add that as a kwarg to get_timeseries_values as well)

([{'ts_id': '68033042', 'rows': '6', 'columns': 'Timestamp,Value', 'data': [['2012-01-01T00:00:00.000+01:00', 1.18], ['2012-01-02T00:00:00.000+01:00', 3.38], ['2012-01-03T00:00:00.000+01:00', 2.95], ['2012-01-04T00:00:00.000+01:00', 2.17], ['2012-01-05T00:00:00.000+01:00', 7], ['2012-01-06T00:00:00.000+01:00', 3]]}],

play aroudn with GMT and GMT+2 as wel, you'll see the timestamps are converted.

Hope this helps already Cheers, b.

olivierbonte commented 1 year ago

Thank you for the help, this was indeed the problem! Unfortanely, "GMT+1" is only recognised as a correct timezone in the vmm.reqquest_kiwis but not for vmm.get_timeseries(). vmm.get_timeseries() does accpet 'Europe/Brussels' as a timezone, but then there is the undesired switching between GMT+1 and GTM+2. As of now, I fixed the issue by giving the start- and enddate to the API in 'Europe/Brussels' in wintertime, which returns the timestamps in GMT and then manually adding one hour to go to GMT+1 all year round.

stijnvanhoey commented 2 months ago

As discussed in https://github.com/fluves/pywaterinfo/issues/67 and https://github.com/fluves/pywaterinfo/pull/68, we rely on pandas (i.e. pytz) for the moment for the time zone conversion and "GMT+1" is indeed not supported by pytz.

Note that pytz (and also the stdlib Python timezone) does actually support Etc/GMT+1 as a timezone definition. You could try using this timezone or request data in UTC and do the conversion afterwards yourself.