Rijkswaterstaat / wm-ws-dl

wm-ws-dl documentation
https://rijkswaterstaatdata.nl/waterdata
11 stars 2 forks source link

prevent duplicated ddl station Code #12

Closed veenstrajelmer closed 3 weeks ago

veenstrajelmer commented 7 months ago

Duplicate station name/codes present in ddl. Minimal reproducible example:

import pandas as pd
import requests

#the webservices 
url_catalog = 'https://waterwebservices.rijkswaterstaat.nl/METADATASERVICES_DBO/OphalenCatalogus'

#The request for ophalencatalogus
catalog_filter = ['Compartimenten']
request_cat = {"CatalogusFilter": {x:True for x in catalog_filter}}

# pull catalog from the API and store in json format
resp = requests.post(url_catalog, json=request_cat) # DDL IMPROVEMENT: it takes a long time to retrieve the catalog, it would be valuable if this could be instantaneous (eg by caching on server side).
if not resp.ok:
    raise Exception('%s for %s: %s'%(resp.reason, resp.url, str(resp.text)))
result_cat = resp.json()
if not result_cat['Succesvol']:
    raise Exception('catalog query not succesful, DDL foutmelding: "%s"'%(result_cat['Foutmelding']))

cat_locatielijst = pd.json_normalize(result_cat['LocatieLijst']).set_index('Locatie_MessageID',drop=True)
bool_dupl_code = cat_locatielijst[['Code']].duplicated(keep=False) #DDL IMPROVEMENT: there are duplicate station Codes present in the catalogus LocatieLijst, sometimes also the Naam+Code combination is duplicated. Possible to merge stations?
if bool_dupl_code.any():
    print(f'WARNING: {bool_dupl_code.sum()} duplicate station Codes present in cat_locatielijst.')
    print(cat_locatielijst.loc[bool_dupl_code,['Naam','Code']].sort_values('Code'))

Gives:

WARNING: 34 duplicate station Codes present in cat_locatielijst.
                                  Naam    Code
Locatie_MessageID                             
11003                     Platform A12     A12
7292                      A12 platform     A12
11320                           Aadorp    AADP
8295                            Aadorp    AADP
10518                             Bath    BATH
13615                             Bath    BATH
10968                   Platform D15-A     D15
6876                      D15 platform     D15
14714                             Echt    ECHT
12498                             Echt    ECHT
22906                          Eemdijk   EEMDK
11124                          Eemdijk   EEMDK
19097                           IJgeul    IJGL
9811                            IJgeul    IJGL
10982                      Platform J6      J6
5377                       J6 platform      J6
11630                           Lobith    LOBH
12817                     Lobith Haven    LOBH
252970329                    Maassluis  MAASSS
10488                        Maassluis  MAASSS
12830                             Mook    MOOK
17373                             Mook    MOOK
17378                             Neer    NEER
15556                             Neer    NEER
5391                               Nes     NES
10309                              Nes     NES
11718                             Olst    OLST
9959                              Olst    OLST
5601               Sint Pieter Noord 2    PIET
17234                          De Piet    PIET
22990                         t Kooike    TKKE
13469                         T_Kooike    TKKE
3999                              Well    WELL
12912                        Well Dorp    WELL

This is inconvenient, since it is impossible to distinguish between historic and realtime stations this way.

Related to https://github.com/Rijkswaterstaat/wm-ws-dl/issues/20

TvLoon-RWS commented 5 months ago

This is a known issue. There are two types of location codes; historic and live data. This will be added to the documentation. In short, the following url shows which location codes+coordinates are historic (HWS_DONAR_SGG) and live (HWS_LMW_SGG) https://geo.rijkswaterstaat.nl/services/ogc/hws/wmdc15/ows?SERVICE=WFS&VERSION=1.1.0&REQUEST=GetFeature&TYPENAME=locaties&outputFormat=text/csv

TvLoon-RWS commented 5 months ago

A second action will be to prevent other location types outside [HWS_DONAR_SGG, HWS_LMW_SGG]

veenstrajelmer commented 3 weeks ago

This will be resolved in the new Wadar WaterWebservices where all location codes are unique and historic and realtime data is available in a single timeseries.