Open sspagnol opened 3 years ago
We are also having issues with these stray nans. See attached plot of missing gridded data in late 2021 between 16-24 m depth. I have detailed my debugging steps below and added a quick fix which seems to solve the problem.
When running the for loop on line 190 in 'grid_variable' function in 'gridded_timeseries.py' for a date when there is missing gridded data, I get the following:
VoI_values = [16.7461, 18.769634, 17.33436, nan, 15.591092, 14.718737, 17.733799, 19.18304, 16.496675, 15.4011135, 15.960257, 13.950939]
depth = [48.674, 24.185463, 40.511154, nan, 72.88726, 96.549965, 32.34831, 16.022615, 56.836845, 80.774826, 64.999695, 109.65999]
then, after sorting on line 198 using function 'sort_depths' I get the following:
VoI_values = [18.769634, 17.33436, 16.7461, nan, 19.18304, 17.733799, 16.496675, 15.960257, 15.591092, 15.4011135, 14.718737, 13.950939]
depth = [24.185463, 40.511154, 48.674, nan, 16.022615, 32.34831, 56.836845, 64.999695, 72.88726, 80.774826, 96.549965, 109.65999]
The variable 'depth_mask' assigns the near surface data as False and the interpolated data is removed. Although I don't fully understand why this happens yet using the above VoI_values and depth as input. My guess is because of the nan and the incorrect sorting.
As a quick fix I added the following on lines 197-201 before the depth and VoI are sorted using function 'sort_depths':
# remove nans if present
if np.sum(np.logical_and(np.isnan(depth), np.isnan(VoI_values))) > 0:
c = np.logical_and(np.isfinite(depth), np.isfinite(VoI_values)).transpose()
VoI_values = list(np.array(VoI_values)[c])
depth = list(np.array(depth)[c])
The hourly data is now interpolated throughout the water column during this time, with the 16-24 m white space in the plot attached now interpolated over.
@sspagnol @mphemming Apologies for the long delay in responding to this query. I have finally had a brief look, and I agree that we could do better in the case illustrated by Simon's toy example. I'll try to spend some more time on this in the next month or two!
Have a query about the interpolation done in grid_variable function in gridded_timeseries.py
Had a situation where in some timesteps the profile had nan data, which when interpolated seems to produce more nans that I expected. Consider the toy problem
which produces
What is the reasoning to want depth_bins 30 and 40 to be set to nan?