Closed callumrollo closed 1 year ago
We can make the interpolate kwarg True by default, or control this behavior from the yaml
Base: 78.48% // Head: 79.47% // Increases project coverage by +0.98%
:tada:
Coverage data is based on head (
6205c43
) compared to base (827f7f7
). Patch coverage: 94.59% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Hi @callumrollo, I'd recommend testing on this delayed mode dataset: https://cproof.uvic.ca/gliderdata/deployments/dfo-bb046/dfo-bb046-20200908/delayed_raw/ I would suggest controlling the behaviour from the yaml, so users really know what they are getting out of the processing, but I defer to @jklymak on this one.
Thanks @hvdosser. Can you send me a yaml file for the dataset you linked? I will then incorporate the nan interpolation with control from the yaml file.
Sorry for the delay on this, just back from vacation
Hi @callumrollo, no worries, thanks for figuring this out! Here's the link to the yaml: https://cproof.uvic.ca/gliderdata/deployments/dfo-bb046/dfo-bb046-20200908/deployment.yml
@hvdosser do these tests look sufficient? I've added some of the data you provided and tested it in default mode and with nan interpolation enabled. We're using GPCTD as the timebase.The nan interpolation adds non-nan values for the other sensors (oxygen and cdom) where they did not sample at the same time as the CTD.
Getting a nice increase isn test coverage from this too :)
Ping @hvdosser for a review on this
@hvdosser if you had a chance to make sure this works it would be appreciated...
@hvdosser do you have a timeframe for reviewing this? It's quite a substantial change, so I'd like to get it merged before addressing other Issues
Hi @callumrollo and @jklymak , sorry for the delay on this. I'll have time on Monday Feb 13th to review and merge.
I've merged in the latest changes from main. Is this good to go @hvdosser?
@callumrollo and @jklymak, I'll merge this pull request, but I do think we need to consider what happens when we have large gaps in the data and we interpolate over them. Particularly for sensors like the ECOpuck, which only samples a small portion of the water column. Am I correct in thinking that setting interpolate=True
will result in new interpolated points everywhere the CTD has data? If that's the case, we'll need to figure something out to deal with these larger gaps before we can move forward.
OK, I think xarray provides this functionality:
note the max_gap parameter, which should be perfect for this. Just set to some reasonable value (that probably changes depending on the sampling frequency) and big gaps would remain NaN...
Hi @hvdosser. The potential problem with slow sampling sensors was raised in the original discussion of Issue #128. Linear interpolation will result in an apparent measurement at every timestamp of the specified timebase. using keep_vars
will ensure these measurements are retained. An alternative would be to implement a max_gap as Jody suggests, or to have the option of nearest neighbout interpolation of some kind. Either way, I feel this is best addressed in a separate PR, preferably with the use of a dataset which demonstrates the issue you want to solve as test data
This will fix Issue #128
This is a first draft. If
interpolate
is set to True inraw_to_timeseries
, nans from variables will be interpolated over as the sensors are aligned with the timestamps of the variable designated as the timebase.This should be tested with some delayed mode data, as the interpolation has no effect on nrt data, as these typically have no nans. @hvdosser @jklymak, do you have a delayed mode dataset you would like to use to test this method?