Deltares / ddlpy

API to Dutch Rijkswaterstaat archive (DDL, waterinfo.rws.nl) of monitoring water data
https://deltares.github.io/ddlpy/
GNU General Public License v3.0
20 stars 6 forks source link

Sum up duplicated periods in `ddlpy.measurements_amount()` #92

Closed veenstrajelmer closed 6 months ago

veenstrajelmer commented 6 months ago

Description

When retrieving the measurement amount we get duplicated periods for some stations. For instance for BATH 1993, but also for CADZD, IJMDBTHVN, NIEUWSTZL, DENOVBTN, WESTKPLE. This is because OphalenAantalWaarnemingen returns multiple amounts for that year, since the WaardeBepalingsmethode changed in that year. These should be summed up so the period index has no duplicated values.

What I Did


import ddlpy
import requests
import pandas as pd

start_date = "1990-01-01"
end_date = "1995-01-01"

print('retrieving DDL locations catalog with ddlpy')
locations = ddlpy.locations()
print('...done')

# select locations on grootheid/groepering/exttypes
bool_grootheid = locations["Grootheid.Code"].isin(["WATHTE"])
bool_groepering_ts = locations["Groepering.Code"].isin(["NVT"])
bool_station_ts = locations.index.isin(["BATH"])
locs_meas_ts = locations.loc[bool_grootheid & bool_groepering_ts & bool_station_ts]

print("ddlpy retrieve amount of measurements")
amount_ts = ddlpy.measurements_amount(location=locs_meas_ts.iloc[0], start_date=start_date, end_date=end_date)
print(amount_ts)

Gives multiple 1993 in the index:

  Groeperingsperiode  AantalMetingen
0               1990           52554
1               1991           52560
2               1992           52704
3               1993           16704
4               1993           35856
5               1994           52560
6               1995               7

We can look into more detail of the metadata when using ddl api directly


print("direct retrieve amount of measurements")
url = "https://waterwebservices.rijkswaterstaat.nl/ONLINEWAARNEMINGENSERVICES_DBO/OphalenAantalWaarnemingen"

request = {'AquoMetadataLijst': [{'Compartiment': {'Code': 'OW'}, 
                                  'Eenheid': {'Code': 'cm'}, 
                                  'Grootheid': {'Code': 'WATHTE'}, 
                                  'Hoedanigheid': {'Code': 'NAP'}, 
                                  'Parameter': {'Code': 'NVT'}, 
                                  'Groepering': {'Code': 'NVT'}}], 
           'LocatieLijst': [{'X': 584172.789970818, 'Y': 5694908.14241489, 'Code': 'BATH'}], 
           'Groeperingsperiode': 'Jaar', 
           'Periode': {'Begindatumtijd': '1990-01-01T00:00:00.000+00:00', 'Einddatumtijd': '1995-01-01T00:00:00.000+00:00'}}

resp = requests.post(url, json=request)
result = resp.json()
if not result['Succesvol']:
    raise Exception(result.get('Foutmelding', 'No error returned'))

if result['Succesvol']:
    df_list = []
    for one in result['AantalWaarnemingenPerPeriodeLijst']:
        df = pd.json_normalize(one['AantalMetingenPerPeriodeLijst'])
        # combine columns to a period string
        df["Groeperingsperiode"] = df["Groeperingsperiode.Jaarnummer"].apply(lambda x: f"{x:04d}")
        # select columns from dataframe and append to list
        df = df[["Groeperingsperiode","AantalMetingen"]]
        df_list.append(df)

    # concatenate
    amount_all = pd.concat(df_list).sort_values("Groeperingsperiode").reset_index(drop=True)
print(amount_all)

Adding this to ddlpy.measurements_amount() will solve this issue:

amount_ts = amount_ts.set_index("Groeperingsperiode")
amount_ts = amount_ts.groupby(amount_ts.index).sum()

Also test this for month/daily groeperingsperiode.