Deltares-research / kenmerkendewaarden

Derive indicators from waterlevel measurements
https://deltares-research.github.io/kenmerkendewaarden/
GNU General Public License v3.0
2 stars 0 forks source link

Fix numbering of extremes for havengetallen #101

Open veenstrajelmer opened 3 months ago

veenstrajelmer commented 3 months ago

fix "Exception: tidal wave numbering: HW numbers not always increasing", at least for HANSWT, BROUWHVSGT08, PETTZD and DORDT. This might not be relevant anymore if we remove moonculminations dependency, since it probably comes from matching culminations to extremes. havengetallen are also called in kw.calc_gemiddeldgetij() in case of scaling.

import os
import hatyan
import kenmerkendewaarden as kw

dir_base = r'p:\11210325-005-kenmerkende-waarden\work'
dir_meas = os.path.join(dir_base,'measurements_wl_18700101_20240101')

data_pd_HWLW_all = kw.read_measurements(dir_output=dir_meas, station='HANSWT', extremes=True)
data_pd_HWLW_all_12 = hatyan.calc_HWLW12345to12(data_pd_HWLW_all) #convert 12345 to 12 by taking minimum of 345 as 2 (laagste laagwater)

df_havengetallen = kw.calc_havengetallen(df_ext=data_pd_HWLW_all_12.loc["2011":"2020"], return_df_ext=False)

#same error with
extended = hatyan.calc_HWLWnumbering(data_pd_HWLW_all_12.loc["2011":"2020"])

Check for other stations Code to check issues on a larger scale:

import os
import hatyan
import kenmerkendewaarden as kw

dir_base = r'p:\11210325-005-kenmerkende-waarden\work'
# dir_base = r'p:\11210325-005-kenmerkende-waarden\work\_backup_20240823'
dir_meas = os.path.join(dir_base,'measurements_wl_18700101_20240101')

station_list = ['A12','AWGPFM','BAALHK','BATH','BERGSDSWT','BROUWHVSGT02','BROUWHVSGT08','GATVBSLE','BRESKVHVN','CADZD',
                'D15','DELFZL','DENHDR','EEMSHVN','EURPFM','F16','F3PFM','HARVT10','HANSWT','HARLGN','HOEKVHLD','HOLWD','HUIBGT',
                'IJMDBTHVN','IJMDSMPL','J6','K13APFM','K14PFM','KATSBTN','KORNWDZBTN','KRAMMSZWT','L9PFM','LAUWOG','LICHTELGRE',
                'MARLGT','NES','NIEUWSTZL','NORTHCMRT','DENOVBTN','OOSTSDE04','OOSTSDE11','OOSTSDE14','OUDSD','OVLVHWT','Q1',
                'ROOMPBNN','ROOMPBTN','SCHAARVDND','SCHEVNGN','SCHIERMNOG','SINTANLHVSGR','STAVNSE','STELLDBTN','TERNZN','TERSLNZE','TEXNZE',
                'VLAKTVDRN','VLIELHVN','VLISSGN','WALSODN','WESTKPLE','WESTTSLG','WIERMGDN','YERSKE']
# station_list = ['DORDT', 'MAASMSMPL', 'PETTZD', 'ROTTDM']
tidalwavenumbers = []
for station in station_list:
    print(f"processing {station}")

    data_pd_HWLW_all = kw.read_measurements(dir_output=dir_meas, station=station, extremes=True, drop_duplicates=True)
    if data_pd_HWLW_all is None:
        print('no data file')
        continue
    data_pd_HWLW_all_12 = hatyan.calc_HWLW12345to12(data_pd_HWLW_all) #convert 12345 to 12 by taking minimum of 345 as 2 (laagste laagwater)

    df_ext = data_pd_HWLW_all_12.loc["2000":"2020"]
    if len(df_ext) == 0:
        print('no data in selected period')
        continue

    try:
        df_havengetallen = kw.calc_havengetallen(df_ext=df_ext, return_df_ext=False)
    except Exception as e:
        print(e)
        tidalwavenumbers.append(station)

    #same error with calc_HWLWnumbering() (after converting the timezone first)
    # extended = hatyan.calc_HWLWnumbering(data_pd_HWLW_all_12.loc["2011":"2020"])
print(tidalwavenumbers)

For the period from 2000-2020 this prints: ['BROUWHVSGT08', 'HANSWT', 'IJMDBTHVN', 'DENOVBTN']. Before also for ['DORDT', 'PETTZD'], but this data is not included in the download anymore

Todo:

Also happens with valid-only data This also happens for IJMDBTHVN, for which the data is clean, but the highwater is asymetric. It is only an issue after shifting the timezone, but it comes from hatyan.calc_HWLWnumbering() already. So even if for kenmerkendewaarden we move to another way of matching culminations and extremes, we will probably want to solve this in hatyan. A follow-up issue is created there: https://github.com/Deltares/hatyan/issues/329

veenstrajelmer commented 3 months ago

HANSWT contains a almost duplicate timestep on 2020-04-01:

                           values qualitycode         status  HWLWcode
time                                                                  
2020-03-30 00:05:00+01:00   -2.35          00  Gecontroleerd         2
2020-03-30 06:11:00+01:00    2.20          00  Gecontroleerd         1
2020-03-30 12:15:00+01:00   -2.28          00  Gecontroleerd         2
2020-03-30 18:36:00+01:00    2.25          00  Gecontroleerd         1
2020-03-31 00:45:00+01:00   -2.03          00  Gecontroleerd         2
2020-03-31 06:45:00+01:00    2.05          00  Gecontroleerd         1
2020-03-31 12:46:00+01:00   -2.33          00  Gecontroleerd         2
2020-03-31 19:15:00+01:00    1.86          00  Gecontroleerd         1
2020-04-01 01:20:00+01:00   -2.11          00  Gecontroleerd         2
2020-04-01 07:32:00+01:00    1.96          00  Gecontroleerd         1
2020-04-01 07:35:00+01:00    1.96          00  Gecontroleerd         1
2020-04-01 13:45:00+01:00   -1.90          00  Gecontroleerd         2
2020-04-01 20:02:00+01:00    1.94          00  Gecontroleerd         1

BROUWHVSGT08 has a almost duplicate timestep on 2015-01-01:

                           values qualitycode           status  HWLWcode
time                                                                    
2015-01-01 04:11:00+01:00   -0.86          00  Ongecontroleerd         2
2015-01-01 05:33:00+01:00   -0.84          00    Gecontroleerd         2
2015-01-01 10:48:00+01:00    1.08          00    Gecontroleerd         1
2015-01-01 10:51:00+01:00    1.05          00    Gecontroleerd         1
2015-01-01 17:18:00+01:00   -1.07          00    Gecontroleerd         2
2015-01-01 17:30:00+01:00   -1.03          00    Gecontroleerd         2
2015-01-01 23:15:00+01:00    1.12          00    Gecontroleerd         1

After removing this, the algorithm works successfully. These issues were already reported in https://github.com/Rijkswaterstaat/wm-ws-dl/issues/43

More stations:


import os
import hatyan
hatyan.close("all")
import kenmerkendewaarden as kw # pip install git+https://github.com/Deltares-research/kenmerkendewaarden

dir_base = r'p:\11210325-005-kenmerkende-waarden\work'
dir_meas = os.path.join(dir_base,'measurements_wl_18700101_20240101')
# dir_meas = r"c:\Users\veenstra\Downloads\measurements_wl_18700101_20240101"

# station_list = ['A12','AWGPFM','BAALHK','BATH','BERGSDSWT','BROUWHVSGT02','BROUWHVSGT08','GATVBSLE','BRESKVHVN','CADZD',
#                 'D15','DELFZL','DENHDR','EEMSHVN','EURPFM','F16','F3PFM','HARVT10','HANSWT','HARLGN','HOEKVHLD','HOLWD','HUIBGT',
#                 'IJMDBTHVN','IJMDSMPL','J6','K13APFM','K14PFM','KATSBTN','KORNWDZBTN','KRAMMSZWT','L9PFM','LAUWOG','LICHTELGRE',
#                 'MARLGT','NES','NIEUWSTZL','NORTHCMRT','DENOVBTN','OOSTSDE04','OOSTSDE11','OOSTSDE14','OUDSD','OVLVHWT','Q1',
#                 'ROOMPBNN','ROOMPBTN','SCHAARVDND','SCHEVNGN','SCHIERMNOG','SINTANLHVSGR','STAVNSE','STELLDBTN','TERNZN','TERSLNZE','TEXNZE',
#                 'VLAKTVDRN','VLIELHVN','VLISSGN','WALSODN','WESTKPLE','WESTTSLG','WIERMGDN','YERSKE']

# almost-duplicate timesteps should still be defined for all of these stations
station_list = [
                # 'CADZD',
                'DELFZL',
                # 'DENHDR',
                # 'HOLWD',
                # 'K13APFM',
                # 'KORNWDZBTN',
                # 'KRAMMSZWT',
                # 'NIEUWSTZL', # duplicate timesteps
                # 'DENOVBTN',
                # 'ROOMPBNN', # duplicate timesteps
                # 'STAVNSE',
                # 'STELLDBTN', # duplicate timesteps
                # 'TERNZN',
                # 'VLAKTVDRN',
                # 'VLIELHVN',
                # 'VLISSGN', # indexError
                # 'WESTKPLE',
                ]

# TODO: also for DORDT and PETZD, but data was not downloaded

list_fails = []
for station in station_list:
    print(f"processing {station}")
    data_pd_HWLW_all = kw.read_measurements(dir_output=dir_meas, station=station, extremes=True)
    if data_pd_HWLW_all is None:
        print("no measurement found for this station")
        continue

    if station=="HANSWT":
        drop_list = ["2020-04-01 07:35:00+01:00"]
    elif station=="BROUWHVSGT08":
        drop_list = ["2015-01-01 04:11:00+01:00",
                     "2015-01-01 10:51:00+01:00",
                     "2015-01-01 17:30:00+01:00"]
    elif station == "BERGSDSWT":
        drop_list = ["1996-07-05 01:09:00+01:00",
                     "1992-01-01 00:50:00+01:00"]
    elif station=="CADZD":
        drop_list = ["1993-01-01 00:30:00+01:00"]
        # TODO: more issues
    else:
        drop_list = []

    data_pd_HWLW_all = data_pd_HWLW_all.drop(drop_list)

    # data_pd_HWLW_all = data_pd_HWLW_all.loc["1960":"1965"]

    # convert 12345 to 12 by taking minimum of 345 as 2 (laagste laagwater)
    data_pd_HWLW_all_12 = hatyan.calc_HWLW12345to12(data_pd_HWLW_all)

    # hatyan.plot_timeseries(ts=data_pd_HWLW_all_12, ts_ext=data_pd_HWLW_all_12)

    # df_havengetallen = kw.calc_havengetallen(df_ext=data_pd_HWLW_all_12.loc["2011":"2020"], return_df_ext=False)
    extended = hatyan.calc_HWLWnumbering(data_pd_HWLW_all_12)

These data issues are reported in https://github.com/Rijkswaterstaat/wm-ws-dl/issues/43