Deltares / ddlpy

API to Dutch Rijkswaterstaat archive (DDL, waterinfo.rws.nl) of monitoring water data
https://deltares.github.io/ddlpy/
GNU General Public License v3.0
20 stars 6 forks source link

Make duplicate dropping optional #24

Closed veenstrajelmer closed 6 months ago

veenstrajelmer commented 6 months ago

Description

ddlpy drops duplicate measurements. Would be good to make this optional for data-inspection purposes.

What I Did

Duplicate values for WALSOD 2010 (and others):

import pandas as pd
import requests

url_ddl = 'https://waterwebservices.rijkswaterstaat.nl/ONLINEWAARNEMINGENSERVICES_DBO/OphalenWaarnemingen'
request_ddl = {'AquoPlusWaarnemingMetadata': 
               {'AquoMetadata': 
                {'Grootheid': {'Code': 'WATHTE'}, 'Groepering': {'Code': 'NVT'}, 
                 'Hoedanigheid': {'Code': 'NAP'}, 'MeetApparaat': {'Code': '127'}}
                }, 
                'Locatie': {'Locatie_MessageID': 10716, 'X': 571389.152745295, 'Y': 5694632.62008149, 'Naam': 'Walsoorden', 'Code': 'WALSODN'}, 
                'Periode': {'Begindatumtijd': '2010-01-01T00:00:00.000+00:00', 'Einddatumtijd': '2010-01-01T00:10:00.000+00:00'}}

resp = requests.post(url_ddl, json=request_ddl)
if not resp.ok:
    raise Exception('%s for %s: %s'%(resp.reason, resp.url, str(resp.text)))
result = resp.json()
if not result['Succesvol']:
    raise Exception('query not succesful, Foutmelding: %s from %s'%(result['Foutmelding'],url_ddl))

result_pd = pd.json_normalize(result['WaarnemingenLijst'][0]["MetingenLijst"])
print(result_pd[["Tijdstip","Meetwaarde.Waarde_Numeriek"]]) # 3 duplicate times

Gives (everything duplicated three times):

                        Tijdstip  Meetwaarde.Waarde_Numeriek
0  2010-01-01T01:00:00.000+01:00                        63.0
1  2010-01-01T01:00:00.000+01:00                        63.0
2  2010-01-01T01:00:00.000+01:00                        63.0
3  2010-01-01T01:10:00.000+01:00                        83.0
4  2010-01-01T01:10:00.000+01:00                        83.0
5  2010-01-01T01:10:00.000+01:00                        83.0

When retreiving NAP/127/WALSODN 2010 we get >157681 waarnemingen, with ddlpy we get 52562 values due to duplicate dropping. This is nice, but good to make it optional. Adjust measurements = measurements.drop_duplicates() in ddlpy.py

Also for NORTHCMRT, but unknown period