Deltares-research / kenmerkendewaarden

Derive indicators from waterlevel measurements
https://deltares-research.github.io/kenmerkendewaarden/
GNU General Public License v3.0
2 stars 0 forks source link

Investigate removing hard-coded moonculmination/extreme offset #103

Closed veenstrajelmer closed 1 month ago

veenstrajelmer commented 4 months ago

Consider:

It appears that the moon culminations are quite important to get decent hour groups. If we even only use the wrong moon culmination we already see a distortion (and rotation) in the aardappelgrafiek (varying offset was implemented with a moonculm_offset argument in https://github.com/Deltares-research/kenmerkendewaarden/issues/102). Furthermore, without the moonculmination there is no time reference with which we can compute time delays.

When relating all extremes to the moonculmination two days before (offset of 4): image

When relating it to the previous moonculmination (no offset): image

So removing that hard-coded offset might sometimes lead to unexpected results when using idxmax to get the springtide (because the aardappelgrafiek is not so smooth anymore). This seems not super robust.

Alternatively, when using the extreme hour instead of the moonculmination hour, we do get quite smooth (albeit rotated) figures:

import os
import hatyan
import kenmerkendewaarden as kw
import matplotlib.pyplot as plt
plt.close("all")
import pandas as pd
import numpy as np

dir_testdata = r"c:\DATA\kenmerkendewaarden\tests\testdata"
dir_testdata = r"c:\Users\veenstra\Downloads\ext_tk_dia"
station = "HOEKVHLD"

file_dia_ext = os.path.join(dir_testdata, f"{station}_ext.dia")
df_ext = hatyan.read_dia(file_dia_ext, station=station, block_ids="allstation")

df_ext["ext_hr"] = df_ext.index.round("h").hour
# remove ext_hr for non-high waters
df_ext.loc[df_ext["HWLWcode"]!=1,"ext_hr"] = np.nan
if df_ext["HWLWcode"].iloc[0] != 1:
    df_ext["ext_hr"].iloc[0] = 0 #fill in dummy value if first extreme is not HW, e.g. for DENHDR 2010
# fill them with preceding high waters
# TODO: if first extreme is not HW, this will not be filled
df_ext["ext_hr"] = df_ext["ext_hr"].ffill()
df_ext["ext_hr"] = df_ext["ext_hr"].astype(int).mod(12)

df_ext_sel = df_ext.loc["2009-12-28":"2021-01-03"]
df_ext_12 = hatyan.calc_HWLW12345to12(df_ext_sel)
df_ext_12_2010_2014 = df_ext_12.loc["2010":"2010"]

ext_stats = kw.calc_HWLWtidalindicators(df_ext_12_2010_2014)
df_havengetallen, data_pd_HWLW = kw.calc_havengetallen(df_ext_12_2010_2014, return_df_ext=True, moonculm_offset=4)

# fig, ax = plt.subplots()
# data_pd_HWLW["culm_hr"].plot(ax=ax)
# data_pd_HWLW["ext_hr"].plot(ax=ax)
# fig, ax = plt.subplots()
# data_pd_HWLW["HWLW_delay"].plot(ax=ax)
# data_pd_HWLW["ext_delay"].plot(ax=ax)

data_pd_HW = data_pd_HWLW.loc[data_pd_HWLW['HWLWcode']==1]
data_pd_LW = data_pd_HWLW.loc[data_pd_HWLW['HWLWcode']==2]

HWLW_culmhr_summary = pd.DataFrame()
HWLW_culmhr_summary['HW_values_median'] = data_pd_HW.groupby(data_pd_HW['culm_hr'])['values'].median()
HWLW_culmhr_summary['HW_delay_median'] = data_pd_HW.groupby(data_pd_HW['culm_hr'])['HWLW_delay'].median()
HWLW_culmhr_summary['LW_values_median'] = data_pd_LW.groupby(data_pd_LW['culm_hr'])['values'].median()
HWLW_culmhr_summary['LW_delay_median'] = data_pd_LW.groupby(data_pd_LW['culm_hr'])['HWLW_delay'].median()

HWLW_culmhr_summary_ext = pd.DataFrame()
HWLW_culmhr_summary_ext['HW_values_median'] = data_pd_HW.groupby(data_pd_HW['ext_hr'])['values'].median()
HWLW_culmhr_summary_ext['HW_delay_median'] = data_pd_HW.groupby(data_pd_HW['ext_hr'])['HWLW_delay'].median()
HWLW_culmhr_summary_ext['LW_values_median'] = data_pd_LW.groupby(data_pd_LW['ext_hr'])['values'].median()
HWLW_culmhr_summary_ext['LW_delay_median'] = data_pd_LW.groupby(data_pd_LW['ext_hr'])['HWLW_delay'].median()

kw.plot_aardappelgrafiek(df_havengetallen)
kw.plot_aardappelgrafiek(HWLW_culmhr_summary_ext)

The second figure looks like: image

This is quite nice, and we could get the values with idxmax easily. However, this can only be visualized properly since we use the HWLW_delay an that is based on the moon culmination. However, also with a moonculm_offset=0, this results in quite a smooth figure: image

This would mean we can remove the hardcoded correction and get decent values for any stations with idxmax. However, the method is slightly more complex to explain. Also first research impact for several stations. In comparison to the original method, we see a difference in MV1/MV2 (width of the aardappelgrafiek), not sure if that is an issue. Also, with the original method culm_hr=0 does not have the highest value, this is 0.5cm lower than the value for culm_hr=11.

The issues with the rotation and 0 that is not at max can be resolved by fitting an ellipse trough culmination hour points: https://stackoverflow.com/questions/47873759/how-to-fit-a-2d-ellipse-to-given-points. But might be overly complex.

veenstrajelmer commented 1 month ago

This is a useful reference, but changing the methodology first has to be discussed with users. For now, first focus on other priorities, so won't do for now and reopen in case it is relevant again.