Rijkswaterstaat / wm-ws-dl

wm-ws-dl documentation
https://rijkswaterstaatdata.nl/waterdata
11 stars 2 forks source link

phase difference VLISSGN due middentijd/eindtijd #41

Closed veenstrajelmer closed 2 months ago

veenstrajelmer commented 4 months ago

In a study by FZ, the comparison between VLISSIGN phases for measured and modelled timeseries show a period with an unexpected offset of ~3 degrees (1987 to 1993): image

During this period, the water levels were time-referenced differently (begintijd instead of middentijd), which might be the cause of this difference. It would be useful if this time shift would be corrected in the data. The time difference for M2 is (12*60+25)/360 = 2.07 min/degree, so a difference of ~3 degrees corresponds to approximately 6 minutes.

To reproduce this, the VLISSGN data is retrieved from the DDL and tidal analysis is performed with hatyan. There is no VLISSGN data available on the DDL for 1978-1986 (https://github.com/Rijkswaterstaat/wm-ws-dl/issues/39), so there are missing values there (and even more years are missing, why?). In the resulting figure, we also see this temporary shift in the tidal phase for the period 1987 to 1993:


import ddlpy
import hatyan
import matplotlib.pyplot as plt
plt.close("all")

locations = ddlpy.locations()
bool_hoedanigheid = locations['Hoedanigheid.Code'].isin(['NAP'])
bool_stations = locations.index.isin(['VLISSGN'])
bool_grootheid = locations['Grootheid.Code'].isin(['WATHTE'])
bool_groepering = locations['Groepering.Code'].isin(['NVT'])
selected = locations.loc[bool_grootheid & bool_hoedanigheid & bool_groepering & bool_stations]

# start_date = "2018-01-01 00:00 +01:00"
# end_date = "2020-01-01 00:00 +01:00"
start_date = "1950-01-01 00:00 +01:00"
end_date = "2020-01-01 00:00 +01:00"

# pass a single row of the locations dataframe to the measurements function to get the measurements for that location
measurements = ddlpy.measurements(selected.iloc[0], start_date, end_date)

ts_meas = hatyan.ddlpy_to_hatyan(measurements.iloc[:-1]) # skip the las (1jan) value
ts_meas = ts_meas[~ts_meas.index.duplicated(keep="first")]
# hatyan.plot_timeseries(ts_meas)
comp_avg, comp_all = hatyan.analysis(ts_meas, const_list=["M2"], analysis_perperiod="Y", return_allperiods=True)

fig, ax = plt.subplots(figsize=(10,5))
data_plot = comp_all.loc["M2","phi_deg"]
ax.plot(data_plot.index.to_timestamp(), data_plot.values, marker="o")
ax.set_title("M2 phase VLISSGN over time")
ax.grid()
fig.tight_layout()

Gives: image

TODO: This also happens at many other stations, make an overview.

TvLoon-RWS commented 4 months ago

@veenstrajelmer This should be described in the waardebewerkingsmethode. Is this different in this period of phase change?

KDoekes-RWS commented 4 months ago

This holds for all MSW-locations and is well known, and not an error at all. In the text begintijd should read eindtijd. At the start of the automatic processing of water level in MSW (Monitoring Systeem Water, one of the predecessors of LMW), from January 1st, 1987 onward, the water level was averaged over the last 10 minutes. Several people involved objected to this, and in a new version operational from September 7th, 1993 the water level was averaged over the previous 5 minutes and the next 5 minutes. As to the data stored until then, the following altenatives were considered:

veenstrajelmer commented 4 months ago

@TvLoon-RWS this is indeed present in the metadata:

print(measurements["WaardeBepalingsmethode.Omschrijving"].drop_duplicates())

Prints:

time
1950-01-01 02:40:00+01:00                            Visuele aflezing van blad
1987-01-01 00:00:00+01:00    Rekenkundig gemiddelde waarde over vorige 10 m...
1993-09-07 00:00:00+01:00    Rekenkundig gemiddelde waarde over vorige 5 en...
Name: WaardeBepalingsmethode.Omschrijving, dtype: object

This implies eindtijd from 1987-01-01 to 1993-09-07 as @KDoekes-RWS also mentions. @KDoekes-RWS again thanks a lot for this detailed explanation. It is valuable to have these things documented for us.

To me it would have made more sense to go for option 2 (different time stamps) to keep the series constant, but I understand the objections to this. I think in most analyses it will be more pragmatic to drop this period instead of correcting the times, but I think all users of the data will have to decide what to do with this based on their application.

TvLoon-RWS commented 2 months ago

With the correct documentation provided in the comments, this issue is closed