Open veenstrajelmer opened 3 months ago
HANSWT contains a almost duplicate timestep on 2020-04-01:
values qualitycode status HWLWcode
time
2020-03-30 00:05:00+01:00 -2.35 00 Gecontroleerd 2
2020-03-30 06:11:00+01:00 2.20 00 Gecontroleerd 1
2020-03-30 12:15:00+01:00 -2.28 00 Gecontroleerd 2
2020-03-30 18:36:00+01:00 2.25 00 Gecontroleerd 1
2020-03-31 00:45:00+01:00 -2.03 00 Gecontroleerd 2
2020-03-31 06:45:00+01:00 2.05 00 Gecontroleerd 1
2020-03-31 12:46:00+01:00 -2.33 00 Gecontroleerd 2
2020-03-31 19:15:00+01:00 1.86 00 Gecontroleerd 1
2020-04-01 01:20:00+01:00 -2.11 00 Gecontroleerd 2
2020-04-01 07:32:00+01:00 1.96 00 Gecontroleerd 1
2020-04-01 07:35:00+01:00 1.96 00 Gecontroleerd 1
2020-04-01 13:45:00+01:00 -1.90 00 Gecontroleerd 2
2020-04-01 20:02:00+01:00 1.94 00 Gecontroleerd 1
BROUWHVSGT08 has a almost duplicate timestep on 2015-01-01:
values qualitycode status HWLWcode
time
2015-01-01 04:11:00+01:00 -0.86 00 Ongecontroleerd 2
2015-01-01 05:33:00+01:00 -0.84 00 Gecontroleerd 2
2015-01-01 10:48:00+01:00 1.08 00 Gecontroleerd 1
2015-01-01 10:51:00+01:00 1.05 00 Gecontroleerd 1
2015-01-01 17:18:00+01:00 -1.07 00 Gecontroleerd 2
2015-01-01 17:30:00+01:00 -1.03 00 Gecontroleerd 2
2015-01-01 23:15:00+01:00 1.12 00 Gecontroleerd 1
After removing this, the algorithm works successfully. These issues were already reported in https://github.com/Rijkswaterstaat/wm-ws-dl/issues/43
More stations:
import os
import hatyan
hatyan.close("all")
import kenmerkendewaarden as kw # pip install git+https://github.com/Deltares-research/kenmerkendewaarden
dir_base = r'p:\11210325-005-kenmerkende-waarden\work'
dir_meas = os.path.join(dir_base,'measurements_wl_18700101_20240101')
# dir_meas = r"c:\Users\veenstra\Downloads\measurements_wl_18700101_20240101"
# station_list = ['A12','AWGPFM','BAALHK','BATH','BERGSDSWT','BROUWHVSGT02','BROUWHVSGT08','GATVBSLE','BRESKVHVN','CADZD',
# 'D15','DELFZL','DENHDR','EEMSHVN','EURPFM','F16','F3PFM','HARVT10','HANSWT','HARLGN','HOEKVHLD','HOLWD','HUIBGT',
# 'IJMDBTHVN','IJMDSMPL','J6','K13APFM','K14PFM','KATSBTN','KORNWDZBTN','KRAMMSZWT','L9PFM','LAUWOG','LICHTELGRE',
# 'MARLGT','NES','NIEUWSTZL','NORTHCMRT','DENOVBTN','OOSTSDE04','OOSTSDE11','OOSTSDE14','OUDSD','OVLVHWT','Q1',
# 'ROOMPBNN','ROOMPBTN','SCHAARVDND','SCHEVNGN','SCHIERMNOG','SINTANLHVSGR','STAVNSE','STELLDBTN','TERNZN','TERSLNZE','TEXNZE',
# 'VLAKTVDRN','VLIELHVN','VLISSGN','WALSODN','WESTKPLE','WESTTSLG','WIERMGDN','YERSKE']
# almost-duplicate timesteps should still be defined for all of these stations
station_list = [
# 'CADZD',
'DELFZL',
# 'DENHDR',
# 'HOLWD',
# 'K13APFM',
# 'KORNWDZBTN',
# 'KRAMMSZWT',
# 'NIEUWSTZL', # duplicate timesteps
# 'DENOVBTN',
# 'ROOMPBNN', # duplicate timesteps
# 'STAVNSE',
# 'STELLDBTN', # duplicate timesteps
# 'TERNZN',
# 'VLAKTVDRN',
# 'VLIELHVN',
# 'VLISSGN', # indexError
# 'WESTKPLE',
]
# TODO: also for DORDT and PETZD, but data was not downloaded
list_fails = []
for station in station_list:
print(f"processing {station}")
data_pd_HWLW_all = kw.read_measurements(dir_output=dir_meas, station=station, extremes=True)
if data_pd_HWLW_all is None:
print("no measurement found for this station")
continue
if station=="HANSWT":
drop_list = ["2020-04-01 07:35:00+01:00"]
elif station=="BROUWHVSGT08":
drop_list = ["2015-01-01 04:11:00+01:00",
"2015-01-01 10:51:00+01:00",
"2015-01-01 17:30:00+01:00"]
elif station == "BERGSDSWT":
drop_list = ["1996-07-05 01:09:00+01:00",
"1992-01-01 00:50:00+01:00"]
elif station=="CADZD":
drop_list = ["1993-01-01 00:30:00+01:00"]
# TODO: more issues
else:
drop_list = []
data_pd_HWLW_all = data_pd_HWLW_all.drop(drop_list)
# data_pd_HWLW_all = data_pd_HWLW_all.loc["1960":"1965"]
# convert 12345 to 12 by taking minimum of 345 as 2 (laagste laagwater)
data_pd_HWLW_all_12 = hatyan.calc_HWLW12345to12(data_pd_HWLW_all)
# hatyan.plot_timeseries(ts=data_pd_HWLW_all_12, ts_ext=data_pd_HWLW_all_12)
# df_havengetallen = kw.calc_havengetallen(df_ext=data_pd_HWLW_all_12.loc["2011":"2020"], return_df_ext=False)
extended = hatyan.calc_HWLWnumbering(data_pd_HWLW_all_12)
These data issues are reported in https://github.com/Rijkswaterstaat/wm-ws-dl/issues/43
fix
"Exception: tidal wave numbering: HW numbers not always increasing"
, at least for HANSWT, BROUWHVSGT08, PETTZD and DORDT. This might not be relevant anymore if we remove moonculminations dependency, since it probably comes from matching culminations to extremes. havengetallen are also called inkw.calc_gemiddeldgetij()
in case of scaling.Check for other stations Code to check issues on a larger scale:
For the period from 2000-2020 this prints:
['BROUWHVSGT08', 'HANSWT', 'IJMDBTHVN', 'DENOVBTN']
. Before also for['DORDT', 'PETTZD']
, but this data is not included in the download anymoreTodo:
KWK_process.py
again (was implemented as a workaround in https://github.com/Deltares-research/kenmerkendewaarden/issues/130)Also happens with valid-only data This also happens for IJMDBTHVN, for which the data is clean, but the highwater is asymetric. It is only an issue after shifting the timezone, but it comes from
hatyan.calc_HWLWnumbering()
already. So even if for kenmerkendewaarden we move to another way of matching culminations and extremes, we will probably want to solve this in hatyan. A follow-up issue is created there: https://github.com/Deltares/hatyan/issues/329