Investigate removing hard-coded moonculmination/extreme offset

Consider:

instead of coupling to moon culminations, group in HW-time-hour groups (each following LW belongs to same hour group, which can be achieved with ffill), then take min and max from these groups.
When taking idxmax/idxmin we can also use it to derive tidalperiod and other group parameters.
Beware that there has to be six hours difference between the groups. idxmax for HW can be used to get LW hour group (hourgroup for spring has to be the same for HW and LW)
We can still couple it to the moonculminations to derive the delay after the moon culmination (but without the two-day delay). Alternatively we could also get some sort of delay from the HW time with respect to noon or midday or with respect to average tide. >> this alternative gives very scattered results, so moonculminations are required if we need delays.
Keep in mind that timings of wl-extremes can have an offset w.r.t. tide, so this approach might not be super valid or at least it will result in slightly different values in compared to the moonculmination reference. >> this does not have a huge impact in general, but has some impact during storms.
think about culm_addtime and hwlwno+4, should be consistent (or not at all), since this hard-coded difference is valid for HOEKVHLD, but not per se for DELFZL and certainly not for UK or Germany/Denmark.
consider calc_HWLW_culmhr_summary_tidalcoeff() to group extremes instead, although this also does not give timediffs with moonculminations so pandas.groupby might be better. If not used, remove from code (including test)
if moonculminations are not required anymore, we can drop the astrog dependency
whatever is decided, consider phasing out moonculm_offset that was implemented in https://github.com/Deltares-research/kenmerkendewaarden/issues/102

It appears that the moon culminations are quite important to get decent hour groups. If we even only use the wrong moon culmination we already see a distortion (and rotation) in the aardappelgrafiek (varying offset was implemented with a moonculm_offset argument in https://github.com/Deltares-research/kenmerkendewaarden/issues/102). Furthermore, without the moonculmination there is no time reference with which we can compute time delays.

When relating all extremes to the moonculmination two days before (offset of 4):

When relating it to the previous moonculmination (no offset):

So removing that hard-coded offset might sometimes lead to unexpected results when using idxmax to get the springtide (because the aardappelgrafiek is not so smooth anymore). This seems not super robust.

Alternatively, when using the extreme hour instead of the moonculmination hour, we do get quite smooth (albeit rotated) figures:

import os
import hatyan
import kenmerkendewaarden as kw
import matplotlib.pyplot as plt
plt.close("all")
import pandas as pd
import numpy as np

dir_testdata = r"c:\DATA\kenmerkendewaarden\tests\testdata"
dir_testdata = r"c:\Users\veenstra\Downloads\ext_tk_dia"
station = "HOEKVHLD"

file_dia_ext = os.path.join(dir_testdata, f"{station}_ext.dia")
df_ext = hatyan.read_dia(file_dia_ext, station=station, block_ids="allstation")

df_ext["ext_hr"] = df_ext.index.round("h").hour
# remove ext_hr for non-high waters
df_ext.loc[df_ext["HWLWcode"]!=1,"ext_hr"] = np.nan
if df_ext["HWLWcode"].iloc[0] != 1:
    df_ext["ext_hr"].iloc[0] = 0 #fill in dummy value if first extreme is not HW, e.g. for DENHDR 2010
# fill them with preceding high waters
# TODO: if first extreme is not HW, this will not be filled
df_ext["ext_hr"] = df_ext["ext_hr"].ffill()
df_ext["ext_hr"] = df_ext["ext_hr"].astype(int).mod(12)

df_ext_sel = df_ext.loc["2009-12-28":"2021-01-03"]
df_ext_12 = hatyan.calc_HWLW12345to12(df_ext_sel)
df_ext_12_2010_2014 = df_ext_12.loc["2010":"2010"]

ext_stats = kw.calc_HWLWtidalindicators(df_ext_12_2010_2014)
df_havengetallen, data_pd_HWLW = kw.calc_havengetallen(df_ext_12_2010_2014, return_df_ext=True, moonculm_offset=4)

# fig, ax = plt.subplots()
# data_pd_HWLW["culm_hr"].plot(ax=ax)
# data_pd_HWLW["ext_hr"].plot(ax=ax)
# fig, ax = plt.subplots()
# data_pd_HWLW["HWLW_delay"].plot(ax=ax)
# data_pd_HWLW["ext_delay"].plot(ax=ax)

data_pd_HW = data_pd_HWLW.loc[data_pd_HWLW['HWLWcode']==1]
data_pd_LW = data_pd_HWLW.loc[data_pd_HWLW['HWLWcode']==2]

HWLW_culmhr_summary = pd.DataFrame()
HWLW_culmhr_summary['HW_values_median'] = data_pd_HW.groupby(data_pd_HW['culm_hr'])['values'].median()
HWLW_culmhr_summary['HW_delay_median'] = data_pd_HW.groupby(data_pd_HW['culm_hr'])['HWLW_delay'].median()
HWLW_culmhr_summary['LW_values_median'] = data_pd_LW.groupby(data_pd_LW['culm_hr'])['values'].median()
HWLW_culmhr_summary['LW_delay_median'] = data_pd_LW.groupby(data_pd_LW['culm_hr'])['HWLW_delay'].median()

HWLW_culmhr_summary_ext = pd.DataFrame()
HWLW_culmhr_summary_ext['HW_values_median'] = data_pd_HW.groupby(data_pd_HW['ext_hr'])['values'].median()
HWLW_culmhr_summary_ext['HW_delay_median'] = data_pd_HW.groupby(data_pd_HW['ext_hr'])['HWLW_delay'].median()
HWLW_culmhr_summary_ext['LW_values_median'] = data_pd_LW.groupby(data_pd_LW['ext_hr'])['values'].median()
HWLW_culmhr_summary_ext['LW_delay_median'] = data_pd_LW.groupby(data_pd_LW['ext_hr'])['HWLW_delay'].median()

kw.plot_aardappelgrafiek(df_havengetallen)
kw.plot_aardappelgrafiek(HWLW_culmhr_summary_ext)

The second figure looks like:

This is quite nice, and we could get the values with idxmax easily. However, this can only be visualized properly since we use the HWLW_delay an that is based on the moon culmination. However, also with a moonculm_offset=0, this results in quite a smooth figure:

This would mean we can remove the hardcoded correction and get decent values for any stations with idxmax. However, the method is slightly more complex to explain. Also first research impact for several stations. In comparison to the original method, we see a difference in MV1/MV2 (width of the aardappelgrafiek), not sure if that is an issue. Also, with the original method culm_hr=0 does not have the highest value, this is 0.5cm lower than the value for culm_hr=11.

The issues with the rotation and 0 that is not at max can be resolved by fitting an ellipse trough culmination hour points: https://stackoverflow.com/questions/47873759/how-to-fit-a-2d-ellipse-to-given-points. But might be overly complex.

Deltares-research / kenmerkendewaarden

Investigate removing hard-coded moonculmination/extreme offset #103