Deltares / ddlpy

API to Dutch Rijkswaterstaat archive (DDL, waterinfo.rws.nl) of monitoring water data
https://deltares.github.io/ddlpy/
GNU General Public License v3.0
19 stars 6 forks source link

Empty response instead of error if request is too large #96

Closed veenstrajelmer closed 4 months ago

veenstrajelmer commented 4 months ago

When requesting five years of waterlevels with 10-minute frequency at once, we get an empty response from the ddl. It makes sense that the request will fail, since the amount of measurments (6*24*365*5=262800) is higher than the maximum amount returned by ddl (157681). However, there is no error but an empty response.

import ddlpy
import logging
logging.basicConfig()
# show log messages of ddlpy
ddlpy.ddlpy.logger.setLevel(logging.DEBUG)

locations = ddlpy.locations()
bool_hoedanigheid = locations['Hoedanigheid.Code'].isin(['NAP'])
bool_stations = locations.index.isin(['HOEKVHLD', 'IJMDBTHVN','SCHEVNGN'])
bool_grootheid = locations['Grootheid.Code'].isin(['WATHTE'])
bool_groepering = locations['Groepering.Code'].isin(['NVT'])
selected = locations.loc[bool_grootheid & bool_hoedanigheid & bool_groepering & bool_stations]

# successful query
start_date = "2020-01-01"
end_date = "2020-02-01"
print(f"retrieving {start_date} to {end_date}")
measurements = ddlpy.measurements(selected.iloc[0], start_date, end_date)
print(measurements) # filled dataframe

# DEBUG:ddlpy.ddlpy:Got  invalid response: {'Succesvol': False, 'Foutmelding': 'Geen gegevens gevonden!'}
start_date = "2080-01-01"
end_date = "2080-01-02"
print(f"retrieving {start_date} to {end_date}")
measurements = ddlpy.measurements(selected.iloc[0], start_date, end_date)
print(measurements) # empty dataframe

# Foutmelding: 'Het max aantal waarnemingen (157681) is overschreven, beperk uw request.'
start_date = "2017-01-01"
end_date = "2020-02-01"
print(f"retrieving {start_date} to {end_date}")
measurements = ddlpy.measurements(selected.iloc[0], start_date, end_date, freq=None)
print(measurements) # empty dataframe

Selection of prints:

retrieving 2020-01-01 to 2020-02-01
DEBUG:ddlpy.ddlpy:0 duplicated values dropped
                          WaarnemingMetadata.StatuswaardeLijst  ...             Y
time                                                            ...              
2020-01-01 01:00:00+01:00                        Gecontroleerd  ...  5.759136e+06
2020-01-01 01:10:00+01:00                        Gecontroleerd  ...  5.759136e+06
2020-01-01 01:20:00+01:00                        Gecontroleerd  ...  5.759136e+06
2020-01-01 01:30:00+01:00                        Gecontroleerd  ...  5.759136e+06
2020-01-01 01:40:00+01:00                        Gecontroleerd  ...  5.759136e+06
                                                       ...  ...           ...
2020-02-01 00:20:00+01:00                        Gecontroleerd  ...  5.759136e+06
2020-02-01 00:30:00+01:00                        Gecontroleerd  ...  5.759136e+06
2020-02-01 00:40:00+01:00                        Gecontroleerd  ...  5.759136e+06
2020-02-01 00:50:00+01:00                        Gecontroleerd  ...  5.759136e+06
2020-02-01 01:00:00+01:00                        Gecontroleerd  ...  5.759136e+06
[4465 rows x 53 columns]

retrieving 2080-01-01 to 2080-01-02
DEBUG:ddlpy.ddlpy:Got  invalid response: {'Succesvol': False, 'Foutmelding': 'Geen gegevens gevonden!'}
DEBUG:ddlpy.ddlpy:No data availble for 2080-01-01 00:00:00 2080-01-02 00:00:00
100%|██████████| 1/1 [00:00<00:00,  1.77it/s]
DEBUG:ddlpy.ddlpy:no data found for this station and time extent
Empty DataFrame
Columns: []
Index: []

retrieving 2017-01-01 to 2020-02-01
DEBUG:ddlpy.ddlpy:Got  invalid response: {'Succesvol': False, 'Foutmelding': 'Het max aantal waarnemingen (157681) is overschreven, beperk uw request.'}
DEBUG:ddlpy.ddlpy:No data availble for 2017-01-01 00:00:00 2020-02-01 00:00:00
100%|██████████| 1/1 [01:39<00:00, 99.30s/it]
DEBUG:ddlpy.ddlpy:no data found for this station and time extent
Empty DataFrame
Columns: []
Index: []

The last query complains about the 'max aantal waarnemingen', but this is exception is not thrown but catched, which results in an empty dataframe instead of an exception. Since we are allowing larger chunks to be retrieved with ddlpy since https://github.com/Deltares/ddlpy/issues/94, it is important to properly raise this exception so the user is aware of the request being too large.

Todo: