Unidata / netcdf-c

Official GitHub repository for netCDF-C libraries and utilities.
BSD 3-Clause "New" or "Revised" License
512 stars 262 forks source link

change in behavior in 4.7.4 in handling in nc_def_var_deflate() #1713

Closed edwardhartnett closed 4 years ago

edwardhartnett commented 4 years ago

Another change that accompanied the recent filter changes was a change in the way nc_def_var_deflate() behaves when called more than once for the same var before enddef.

It used to be that nc_def_var_deflate() could be called repeatedly, and the settings could be changed, until enddef.

Now, the first call to nc_def_var_deflate() sets the deflate settings. Subsequent calls are ignored.

This was never tested in netCDF tests, but it does come up in PIO tests, so I'm sorry I didn't notice this before - I've been spending my time on the GFS weather model instead. ;-)

I'm putting together a test now to demonstrate the problem. @WardF if there is going to soon be a new release, with a reverse of some of the other changes in this code, then we probably want to get this fix in as well before that.

PIO counts on this behavior,. and its not hard to see how other programs may as well, especially fortran codes, which may set all vars to a default deflate, and then allow individual vars to be set to different levels as needed. This worked before, but with 4.7.4 all vars will receive the default deflate settings, and subsequent changes will be ignored.

edwardhartnett commented 4 years ago

@DennisHeimbigner in the old code, setting the deflate setting just changed values in the NC_VAR_INFO_T struct, and then, at enddef, whatever values were present were used.

But now the filter is put on a list as soon as def_var_deflate() is called(?). I think you want to hold off until enddef. That's how everything else works, and there are subtle side-effects when it comes to renaming.

Alternatively, have subsequent calls to nc_def_var_deflate() find the item in the list and change the settings. Then, like the old code, when enddef occurs you will be using the most recent settings, not the first settings.

DennisHeimbigner commented 4 years ago

Doh! It did not occur to me to consider the case where nc_def_var_filter was called multiple times with different filter values before nc_enddef was called. I will fix. Thanks for point this out.

DennisHeimbigner commented 4 years ago

There is a potential issue though. THe order in which nc_def_var is called determines the order in which the filters are applied. If one calls nc_def_var_filter twice, should the filter application order change? I would say no. That you need to remove the filter first before adding it back in order to change the application order.

edwardhartnett commented 4 years ago

Prior to your filter work, the only two-filter combo was zlib + shuffle. So that's the only legacy case we have to worry about. And those two filters are set by the same function call, so no ordering is needed.

I think you need to handle what happens if they turn shuffle on then off again. The answer in all cases is that the most recent call to nc_def_var_deflate() overrides all previous calls, until enddef.

For other filters, I think you can define whatever rules you want; I suggest you document them in the code and in the doxygen docs, so it will be clear and others working on the filter code in the future can bear it in mind.

DennisHeimbigner commented 4 years ago

ok, Then the rules I propose are as follows:

Order of Invocation for Multiple Filters

When multiple filters are defined on a variable, the order of application, when writing data to the file, is same as the order in which _nc_def_varfilter is called. When reading a file the order of application is of necessity the reverse.

There are some special cases.

  1. If _nc_def_varfilter or _nc_def_vardeflate or _nc_def_varszip is called multiple times with the same filter id, but possibly with different sets of parameters, then the position of that filter in the sequence of applictions does not change. However the last set of parameters specified is used when actually writing the dataset.
  2. Deflate and shuffle -- these two are inextricably linked in the current API. If you call _nc_def_vardeflate multiple times, then the previous rule applies with respect to deflate. However, the shuffle filter, if enabled, is ''always'' applied just before applying deflate.
  3. If you want to move the location of a filter in the application sequence, then you must remove it using _nc_var_filterremove or using _nc_def_vardeflate. The next time you add the filter back, its position will be at the end of the current sequence.
WardF commented 4 years ago

Sounds good to me, @DennisHeimbigner

DennisHeimbigner commented 4 years ago

Minor changes.

  1. fletcher32 is defined before any other filters
  2. shuffle is defined before any filters but after fletcher32 I think fletcher always needs to be first since it is a checksum. It is possible that shuffle should only be used with deflate and no other filters, but I am not sure.
edwardhartnett commented 4 years ago

Shuffle may be used with any compression filter, so deflate, szip, or with a third party compression filter.

I forgot about fletcher! ;-)

edwardhartnett commented 4 years ago

I believe this is fixed now, I will close this issue.