USGS-R / drb-estuary-salinity-ml

Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

butterworth filter not working on all sites #58

Closed amsnyder closed 2 years ago

amsnyder commented 2 years ago

Some sites return no data when put through the butterworth filter in the munge step (ex: site 8551762). Appears to be an issue on sites that have missing data in the time series.

amsnyder commented 2 years ago

We are currently using scipy.signal.butter: MicrosoftTeams-image

MicrosoftTeams-image (1)

Salme uses T_Tide in MATLAB for applying the Butterworth filter and has not had an issue. There is also a Python version, but she has not worked with it. I am looking into whether this package fills data gaps somehow.

amsnyder commented 2 years ago

working Alternative 1: If we use ttide instead of scipy - need help translating input params. I am storing the working function here for later reference, with the parameters I have already translated or that I would need to ask Salme about. t_tide(x, dt=10, out_style='classic', corr_fs=[0, 1e6], corr_fac=[1, 1], secular='mean', ray=1, errcalc='cboot', synth=2, lsq='best')

amsnyder commented 2 years ago

Alternative 2: Deal with missing data before putting it into scipy filter.

This is what ttide mentions about dealing with missing data:

Although missing data can be handled with NaN, it is wise not to have too many of them. If your time series has a lot of missing data at the beginning and/or end, then truncate the input time series. The Rayleigh criterion is applied to frequency intervals calculated as the inverse of the input series length.

They seem to be filling the gaps here, using a simple linear interpolation: https://github.com/moflaher/ttide_py/blob/58010f71b0ce074425038bb0dc312bc3d23617e6/ttide/t_tide.py#L493 They take the input data (xin), calculate some sort of least squares fitted prediction (xout), then calculate the residuals between the two (xres). They fill the gaps on xres.

Because we are trying to fill the gaps before any of the calculations of the butterworth filter are handled, we want to fill gaps on the original data (xin) not the residuals (xres). Filling the gaps on the input data should yield the same result as filling gaps on the residuals because calculating the residuals is just a simple subtraction (xin-xout).

amsnyder commented 2 years ago

We will apply the fixgaps function used by ttides to fill gaps using linear interpolation before we input the data into the scipy.signal.butter function.

salme146 commented 2 years ago

After chatting with Amelia - I agree that filling the gaps with linear interpolation would work fine. If there are huge long gaps - you could always use the predicted water level as a guesstimate to what the time series would have been at that time. Some NOAA stations do not report the predicted water levels, FYI.