Deltares-research / kenmerkendewaarden

Derive indicators from waterlevel measurements
https://deltares-research.github.io/kenmerkendewaarden/
GNU General Public License v3.0
2 stars 0 forks source link

Align havengetallen and tidalindicators #79

Closed veenstrajelmer closed 3 months ago

veenstrajelmer commented 3 months ago

The spring/mean/neap values from havengetallen should in theory compare well to the spring/mean/neap values from tidalindicators. This seems not to be the case.

import os
import hatyan
import kenmerkendewaarden as kw
import matplotlib.pyplot as plt
plt.close("all")
import pandas as pd

dir_testdata = r"c:\DATA\kenmerkendewaarden\tests\testdata"

file_dia_ext = os.path.join(dir_testdata, "HOEKVHLD_ext.dia")
df_ext = hatyan.read_dia(file_dia_ext, station="HOEKVHLD", block_ids="allstation")
df_ext_sel = df_ext.loc["2009-12-28":"2021-01-03"]
df_ext_12 = hatyan.calc_HWLW12345to12(df_ext_sel)
df_ext_12_2010_2014 = df_ext_12.loc["2010":"2020"]

ext_stats = kw.calc_HWLWtidalindicators(df_ext_12_2010_2014)
df_havengetallen, data_pd_HWLW = kw.calc_havengetallen(df_ext_12_2010_2014, return_df_ext=True)

# re-compute spring/neap/mean values from df_havengetallen with a more transparent workflow
bool_HW = data_pd_HWLW['HWLWcode']==1
bool_LW = data_pd_HWLW['HWLWcode']==2
bool_spring = data_pd_HWLW['culm_hr']==0
bool_neap = data_pd_HWLW['culm_hr']==6

data_pd_HW_spring = data_pd_HWLW.loc[bool_HW & bool_spring]
data_pd_HW_mean = data_pd_HWLW.loc[bool_HW]
data_pd_HW_neap = data_pd_HWLW.loc[bool_HW & bool_neap]
# data_pd_LW_spring = data_pd_HWLW.loc[bool_LW & bool_spring]
# data_pd_LW_mean = data_pd_HWLW.loc[bool_LW]
# data_pd_LW_neap = data_pd_HWLW.loc[bool_LW & bool_neap]

data_pd_HW_spring_median_peryear = data_pd_HW_spring.groupby(pd.PeriodIndex(data_pd_HW_spring.index, freq="Y"))["values"].median()
data_pd_HW_mean_median_peryear = data_pd_HW_mean.groupby(pd.PeriodIndex(data_pd_HW_mean.index, freq="Y"))["values"].median()
data_pd_HW_neap_median_peryear = data_pd_HW_neap.groupby(pd.PeriodIndex(data_pd_HW_neap.index, freq="Y"))["values"].median()

fig,ax = plt.subplots()
cmap = plt.get_cmap("tab10")
ext_stats['HW_monthmax_mean_peryear'].plot(ax=ax)
ext_stats['HW_mean_peryear'].plot(ax=ax)
ext_stats['HW_monthmin_mean_peryear'].plot(ax=ax)
xlims = ax.get_xlim()
# havengetallen per year
data_pd_HW_spring_median_peryear.plot(color=cmap(0))
data_pd_HW_mean_median_peryear.plot(color=cmap(1))
data_pd_HW_neap_median_peryear.plot(color=cmap(2))
# re-computed havengetallen
ax.hlines(data_pd_HW_spring["values"].median(), xlims[0], xlims[1], color=cmap(0), linestyle='--')
ax.hlines(data_pd_HW_mean["values"].median(), xlims[0], xlims[1], color=cmap(1), linestyle='--')
ax.hlines(data_pd_HW_neap["values"].median(), xlims[0], xlims[1], color=cmap(2), linestyle='--')
# validation from df_havengetallen (compares very well to re-computed values, mean is slightly different)
ax.hlines(df_havengetallen["HW_values_median"].loc[0], xlims[0], xlims[1], color=cmap(0), linestyle='-.')
ax.hlines(df_havengetallen["HW_values_median"].loc["mean"], xlims[0], xlims[1], color=cmap(1), linestyle='-.')
ax.hlines(df_havengetallen["HW_values_median"].loc[6], xlims[0], xlims[1], color=cmap(2), linestyle='-.')

the different methods in havengetallen/tidalindicators result in significant (~40 cm) differences in mean high water values for spring and neap tide: image

veenstrajelmer commented 3 months ago

It seems that storm-induced extremes strongly affect the mean values in case of monthly maxima (tidalindicators), while for havengetallen we purely look at the culmnation hour (a proxy for spring/neap tide or anything in between). When we plot the high waters from culmination hour 0/6/all (spring/neap/mean tide), we clearly see that springtide (culmination hour 0) does not include the most extreme maxima. These are included in the yearly means derived within the tidalindicators module. This could explain these different values.

# plot spring/mean/neap values based on culmination hours, this clearly shows that not all highest extremes are in the spring-tide class, so might explain why these methods return different results.
fig,ax = plt.subplots()
data_pd_HW_mean["values"].plot(ax=ax, linewidth=1)
data_pd_HW_spring["values"].plot(ax=ax)
data_pd_HW_neap["values"].plot(ax=ax)
ax.set_xlim(pd.Timestamp("2010"), pd.Timestamp("2011"))
ax.set_ylim(0.2,2.2)

Gives image

Since this is now explained, this issue will be closed.