Deltares / ddlpy

API to Dutch Rijkswaterstaat archive (DDL, waterinfo.rws.nl) of monitoring water data
https://deltares.github.io/ddlpy/
GNU General Public License v3.0
20 stars 6 forks source link

add sorting of returned timeseries #27

Closed veenstrajelmer closed 6 months ago

veenstrajelmer commented 6 months ago

Description

When retrieving data for multiple months, the results are not sorted on time. This happens in ddl/waterwebservices also, so it is not a ddlpy problem. However, it is convenient to sort the returned dataframe on time anyway.

What I Did

import ddlpy # TODO: we require ddlpy from main/master branch (>0.1.0) >> pip install git+https://github.com/openearth/ddlpy
import datetime as dt
import matplotlib.pyplot as plt
plt.close("all")

# input parameters
start_date  = dt.datetime(2019,10,24)
end_date = dt.datetime(2019,12,5)

locations = ddlpy.locations()
bool_hoedanigheid = locations['Hoedanigheid.Code'].isin(['NAP'])
bool_stations = locations.index.isin(['HOEKVHLD'])
bool_grootheid = locations['Grootheid.Code'].isin(['WATHTE'])
locs_wathte = locations.loc[bool_grootheid & bool_hoedanigheid & bool_stations]

# retrieve with ddlpy
meas_wathte = ddlpy.measurements(locs_wathte.iloc[0], start_date=start_date, end_date=end_date)
# filter measured waterlevels (drop waterlevel extremes)
meas_wathte_ts = meas_wathte.loc[meas_wathte['Groepering.code'].isin(['NVT'])]
# sort on time values # TODO: do this in ddlpy or in ddl
# meas_wathte_ts = meas_wathte_ts.sort_values("t")
fig, ax = plt.subplots()
ax.plot(meas_wathte_ts['t'], meas_wathte_ts['Meetwaarde.Waarde_Numeriek'])
fig.tight_layout()

Gives: image

When directly calling waterwebservices we also get this, albeit with different cuts (probably due to the ddlpy month subsetting):

import ddlpy # TODO: we require ddlpy from main/master branch (>0.1.0) >> pip install git+https://github.com/openearth/ddlpy
import datetime as dt
import matplotlib.pyplot as plt
plt.close("all")
import requests
import pandas as pd

# input parameters
start_date  = dt.datetime(2019,10,24)
end_date = dt.datetime(2019,12,5)

locations = ddlpy.locations()
bool_hoedanigheid = locations['Hoedanigheid.Code'].isin(['NAP'])
bool_stations = locations.index.isin(['HOEKVHLD'])
bool_grootheid = locations['Grootheid.Code'].isin(['WATHTE'])
locs_wathte = locations.loc[bool_grootheid & bool_hoedanigheid & bool_stations]

# direct retrieve
url_ddl = 'https://waterwebservices.rijkswaterstaat.nl/ONLINEWAARNEMINGENSERVICES_DBO/OphalenWaarnemingen'
request_ddl = {"Locatie":{"Code":locs_wathte.iloc[0].name, "X":locs_wathte.iloc[0]["X"], "Y":locs_wathte.iloc[0]["Y"]},
 "AquoPlusWaarnemingMetadata":{
   "AquoMetadata":{"Grootheid":{"Code":"WATHTE"},
                   "Hoedanigheid":{"Code":"NAP"},
                   "Groepering":{"Code":"NVT"}}},
 "Periode":{
   "Begindatumtijd":"2019-10-24T00:00:00.000+01:00",
   "Einddatumtijd":"2019-12-05T00:00:00.000+01:00"}}

resp = requests.post(url_ddl, json=request_ddl)
if not resp.ok:
    raise Exception('%s for %s: %s'%(resp.reason, resp.url, str(resp.text)))
result = resp.json()
if not result['Succesvol']:
    raise Exception('query not succesful, Foutmelding: %s from %s'%(result['Foutmelding'],url_ddl))
for one in result['WaarnemingenLijst']:
    # print(one['AquoMetadata']['Grootheid'])
    # print(one['AquoMetadata']['Hoedanigheid'])
    # print(one['AquoMetadata']['Groepering'])
    data_ddl = pd.json_normalize(one['MetingenLijst'])

data_ddl["t"] = pd.DatetimeIndex(data_ddl['Tijdstip'])
# sort on time values # TODO: do this in ddlpy or in ddl
# data_ddl = data_ddl.sort_values("t")
fig, ax = plt.subplots()
ax.plot(data_ddl["t"], data_ddl['Meetwaarde.Waarde_Numeriek'])
ax.set_title("data from waterwebservices")
fig.tight_layout()

Gives: image