aodn / python-aodntools

Repository for templates and code relating to generating standard NetCDF files for the Australia Ocean Data Network
GNU Lesser General Public License v3.0
10 stars 3 forks source link

Query about interpolation in gridded_timeseries.py #140

Open sspagnol opened 3 years ago

sspagnol commented 3 years ago

Have a query about the interpolation done in grid_variable function in gridded_timeseries.py

Had a situation where in some timesteps the profile had nan data, which when interpolated seems to produce more nans that I expected. Consider the toy problem

print("np.interp test")
depth_bins = np.array([10, 20, 30, 40, 50])

depths = np.array([12, 27, 35, 46])
values = np.array([1, 2, np.nan, 5])

print(depth_bins)
interp_values = np.interp(depth_bins, depths, values, left=np.nan, right=np.nan)
print(interp_values)

# interpolate non-nan
nmask = np.logical_or(np.isnan(depths), np.isnan(values))
interp_values = np.interp(depth_bins, depths[~nmask], values[~nmask], left=np.nan, right=np.nan)
print(interp_values)

which produces

np.interp test
[10 20 30 40 50]
[       nan 1.53333333        nan        nan        nan]
[       nan 1.53333333 2.47368421 4.05263158        nan]

What is the reasoning to want depth_bins 30 and 40 to be set to nan?

mphemming commented 2 years ago

We are also having issues with these stray nans. See attached plot of missing gridded data in late 2021 between 16-24 m depth. I have detailed my debugging steps below and added a quick fix which seems to solve the problem.

figure

When running the for loop on line 190 in 'grid_variable' function in 'gridded_timeseries.py' for a date when there is missing gridded data, I get the following:

VoI_values = [16.7461, 18.769634, 17.33436, nan, 15.591092, 14.718737, 17.733799, 19.18304, 16.496675, 15.4011135, 15.960257, 13.950939]
depth = [48.674, 24.185463, 40.511154, nan, 72.88726, 96.549965, 32.34831, 16.022615, 56.836845, 80.774826, 64.999695, 109.65999]

then, after sorting on line 198 using function 'sort_depths' I get the following:

VoI_values = [18.769634, 17.33436, 16.7461, nan, 19.18304, 17.733799, 16.496675, 15.960257, 15.591092, 15.4011135, 14.718737, 13.950939]
depth = [24.185463, 40.511154, 48.674, nan, 16.022615, 32.34831, 56.836845, 64.999695, 72.88726, 80.774826, 96.549965, 109.65999]

The variable 'depth_mask' assigns the near surface data as False and the interpolated data is removed. Although I don't fully understand why this happens yet using the above VoI_values and depth as input. My guess is because of the nan and the incorrect sorting.

As a quick fix I added the following on lines 197-201 before the depth and VoI are sorted using function 'sort_depths':

# remove nans if present
if np.sum(np.logical_and(np.isnan(depth), np.isnan(VoI_values))) > 0:
    c = np.logical_and(np.isfinite(depth), np.isfinite(VoI_values)).transpose()
    VoI_values = list(np.array(VoI_values)[c])
    depth = list(np.array(depth)[c])

The hourly data is now interpolated throughout the water column during this time, with the 16-24 m white space in the plot attached now interpolated over.

mhidas commented 1 year ago

@sspagnol @mphemming Apologies for the long delay in responding to this query. I have finally had a brief look, and I agree that we could do better in the case illustrated by Simon's toy example. I'll try to spend some more time on this in the next month or two!