Open DennisHeimbigner opened 6 years ago
These are all good, except the last one.
There are lots of cases for user defined formats, where the user defines only the get/put_vara functions, and leaves the vars functions to the NCDEFAULT version, which just calls the vara function over and over again to make up the vars call. In fact, this is how the HDF4 layer works, and also the AB Dispatch layer (part of a separate library, the first of the external user-defined formats).
So the vara functions are very useful in the dispatch layer, IMO. They will be easier to write for user-defined formats than the vars functions, for most formats.
Good point about var/vars. We might also want to consider some more structured way to pass a void* parameter; perhaps something like the HDF5 properties mechanism.
There is a fundamental difference between netcdf properties and HDF5 property lists, which is that HDF5 property lists are exposed to the cold, brutal hands of the user, whereas the netcdf properties will only be touched by dispatch-layer netcdf programmers, never the end user.
Right now we have void *parameters in open and create. According to your proposed changes, we would also have them in a few other places. These amount to defacto properties lists for these functions.
How about defined structs?
So for example a struct NC_OPEN_PROPERTIES, which contained all the fields that would otherwise end up in a void *parameter in the dispatch nc_open().
New parameters can just be added to the struct without breaking any existing code.
Seems like nc_inq_format_extended() could be moved out of the dispatch table and implemented in libdispatch in a way that would work for all dispatch layers.
I suggest we make all dispatch functions that take ncid as a parameter instead get a pointer to the NC object for the file.
Currently we look up the file 3 times for a call to nc_put_vara_int().
1 - We look it up (and don't use it) in nc_put_vara_int() (and this lookup is removed in a pending PR I have up.) 2 - We look it up in NC_put_vara() (dvarput.c) so we can find the correct dispatch layer. 3 - We look it up in NC4_put_vars() (hdf5var.c) when the dispatch layer is called.
The reason is that we always pass ncid, but we already have to look up NC from ncid in order to identify the dispatch layer. So we should just pass *NC (which contains the ncid), instead of passing the ncid and looking up NC again.
As we have seen, some users open thousands of files at a time. So looking up the file twice for each operation is something we should avoid.
Unfortunately, this is a big change that will touch many, many code files at once.
I remember that I did this for a reason, but I no longer remember the reason. It may have had something to do with recursive calls into the API. Libdap2 code uses the netcdf-3 API and does not have the dispatch table.
Let me see if I can recall the rationale.
I ran into the recursion problem when working on PIO and netCDF. The answer is to call the nc3_* version of the functions, and not invoke the dispatch layer at all.
With the changes I am proposing, the libdap2 code should look up the NC and call the function in libsrc directly (passing pointer to NC and not ncid). The libdap2 code is only every going to want the classic version of the function, so there is no need to use the dispatch layer.
No doubt that is some effort but I don't think that looking up every file twice, for every operation, is a good solution. I would be happy to make the changes, under your supervision, if that would be helpful.
Proposed changes/cleanup of the NC_Dispatch structure
add a version number as the first field. For back compatibility purposes, the version number should be detectable as different from the current first field, the model.
Make it a requirement that all fields be provided, even if netcdf-4 is not enabled. That is, the set of fields is not to be conditional on any flag.
Hide the initialsz, basepe, chunksizehintp parameters in the (*create) field by moving them into the parameters argument.
Hide the basepe, chunksizehintp parameters in the (*open) field by moving them into the parameters argument.
The (enddef) field has extra parameters solely for the use of MPI: h_minfree,v_align,v_minfree,r_align. Convert these to a single parameter block (like in, say, (close)).
Remove (inq_base_pe) and merge into (inq).
Create a new field called (*set) that sets various parameters at the top-level ncid level.
Merge (inq_format), (inq_format_extended), (set_fill), & (set_base_pe) into the new (*set) operation.
Remove (get_varm) and (put_varm) and always use NCDEFAULT_get/put_varm
Modify (*inq_var_all) to remove the deflate arguments since they can be handled by the idp+nparams+params filter arguments.
Merge (def_var_deflate) into (def_var_filter)
Merge (var_par_access) into (def_var)
Merge (def_var_fill) into (def_var)
Remove (*show_metadata) since this is actually a debugging function [This actually should be removed from netcdf.h, but probably not now]
Merge (inq_unlimdim) and (inq_unlimdims)
(Optional) create a (inq_grp_all) similar to (inq) and merge (inq) into (inq_grp_all)
(Optional) remove (get_vara) and (put_vara) and convert them to calls to (*get/put_vars).