NOAA-PMEL / Ferret

The Ferret program from NOAA/PMEL
https://ferret.pmel.noaa.gov/Ferret/
The Unlicense
55 stars 21 forks source link

Incorrect detection of time overlap in time aggregation with a given axis #1979

Closed karlmsmith closed 4 years ago

karlmsmith commented 4 years ago

Wei is aggregating datasets which have varying number of timesteps (1 to 10) of weekly data. There is only one duplicated time step in the set of 54 files, but otherwise the joining of all the time coordinates would form a perfectly regular time axis of weekly data. To ignore the duplicated time she tried time aggregation to a given time axis allowing overlaps.

LET files=SPAWN("ls -1 *.nc")
DEFINE AXIS /T="18-DEC-2011 12":"19-APR-2020 12":7 /UNITS="day" /T0="15-DEC-2011 00" aggtime
LET filelist = files[I=1:25]
DEFINE DATA/AGG/T/TAXIS=aggtime@XACT:0.01/TOVERLAP all = filelist

If just the first 25 files are used it works but gives a erroneous warning about ignoring (and does ignore, which it should not) the last time step from each previous dataset. If the 26th dataset is added, it fails because it is ignoring the only time step in the 25th dataset. But there should have only been one ignored timestep when aggregating all 54 datasets.

karlmsmith commented 4 years ago

The detection of overlaps was finding an overlap if the maximum cell boundary of the previous dataset was greater than or equal to the minimum cell boundary of the current dataset. These boundaries will typically be equal for consecutive coordinate cells, so this bug will show up whenever datasets align next to each other on a provided aggregation time axis.

karlmsmith commented 4 years ago

There is an appropriate test of this in the benchmarks and it was (before any changes) working correctly. The change to "greater than" still detects the single overlap in the test case but misses the first overlap in a multiple overlap in the test. So still figuring out the bug and its effects.

karlmsmith commented 4 years ago

Fixed by comparing the integer indices of time axis coordinates of the datasets. The problem was arising from the common complications of comparing floating point values. Since the time axis is provided, all time coordinates must be assigned to the given time coordinates, so integer comparison of indices is possible, easier, and cleaner.