Unidata / netcdf-c

Official GitHub repository for netCDF-C libraries and utilities.
BSD 3-Clause "New" or "Revised" License
513 stars 262 forks source link

Netcdf4Dimid Writing or Not Based on Variable Write Order? #1466

Closed PaulHuwe closed 5 years ago

PaulHuwe commented 5 years ago

Environment Information

Centos6 64 Bit Anaconda Python 2.7 Library version: netCDF4 (1.5.1.2)

Summary of Issue

The internal attribute _Netcdf4Dimid is inconsistently showing up in output files for our tools, and it appears to be due to variable write order (with data).

Steps to reproduce the behavior

Using the following input file: https://tropomi.gesdisc.eosdis.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__AER_AI.1/2019/025/S5P_OFFL_L2__AER_AI_20190125T105654_20190125T123824_06655_01_010202_20190131T102415.nc

I try to read the following two variables in, and write a subset of them to an output file: METADATA/QA_STATISTICS/aerosol_index_340_380_histogram_axis METADATA/QA_STATISTICS/aerosol_index_340_380_histogram

METADATA/QA_STATISTICS/aerosol_index_340_380_histogram_axis is a Dimension scale variable.

If I read & write aerosol_index_340_380_histogram_axis first & assign it data, and next read & write aerosol_index_340_380_histogram & assign it data (scratch_6.py [txt]), then aerosol_index_340_380_histogram does NOT get an _Netcdf4Dimid attribute.

However, if I do the same steps, but with the variables reversed - that is read & write aerosol_index_340_380_histogram first & assign it data, and next read & write aerosol_index_340_380_histogram_axis & assign it data (scratch_5.py [txt]) - then aerosol_index_340_380_histogram DOES get an _Netcdf4Dimid attribute.

This behavior is quite confusing, unexpected, and not consistent with the other dimension scale variables in the file (except aerosol_index_354_388_histogram_axis which behaves the same). Is this a bug, or is there a design reason this is occurring that I need to code towards?

Code: scratch_5.txt scratch_6.txt

WardF commented 5 years ago

Thanks, will take a look at this!

WardF commented 5 years ago

Unfortunately, I'm getting a '404 file not found' when I try to access this file. Can you confirm that it is still present? I can try to debug this with a different file, but would like to duplicate the issue as closely as possible.

PaulHuwe commented 5 years ago

If I click the link, I get a file. We do require a (free) account to download from NASA now. If you would prefer, I could put it here as an attachment, but it is large (~125MB).

edhartnett commented 5 years ago

The _Netcdf4Dimid attribute is used when needed to retain dimension ordering that would otherwise be lost.

Since it is a hidden attribute, why is it a problem for you?

PaulHuwe commented 5 years ago

Because I am trying to preserve the original file as close as possible, and the _Netcdf4Dimid exists in the original, but not some of our subsets (it is inconsistent). The inconsistency also affects our ability to test our services.

edhartnett commented 5 years ago

If you create the file with netcdf or any of the netcdf utilities, you don't have to worry about whether it is present or not.

If you use netcdf for testing (i.e. to open the file and check it) then once again it should not matter.

In fact, the only way to see it is to use netcdf-c to create the file, and then use hdf5 to read the file. Is that what you are doing?

PaulHuwe commented 5 years ago

Yes, we are using hdf5 tools in addition to netcdf tools to read the file.

edhartnett commented 5 years ago

OK, well if I recall correctly, the _Netcdf4Dimid attribute is used whenever the creation order of the dimension scale dataset is not the same as the dimid order. This occurs in the following situation: nc_def_dim "dim_a" nc_def_dim "dim_b" nc_def_var "dim_a"

So if I define two dimensions dim_a and dim_b they will be created in that order and have dimids 0 and 1.

Then I do an nc_def_var for dim_a (because I am going to write coordinate data). This causes the dimension scale dataset for dim_a to be re-created, which changes the creation order, and would then break the dimids. So a _NetcdfDimid attribute will be attached to the dimension scale datasets, to indicate the correct dimids.

Does that make sense?

edhartnett commented 5 years ago

I would suggest you write your hdf5-based tests so that they ignore the netcdf-4 hidden attributes. They are (from libhdf5/hdf5file.c):

static const NC_reservedatt NC_reserved[NRESERVED] = {
    {NC_ATT_CLASS, READONLYFLAG|DIMSCALEFLAG},            /*CLASS*/
    {NC_ATT_DIMENSION_LIST, READONLYFLAG|DIMSCALEFLAG},   /*DIMENSION_LIST*/
    {NC_ATT_NAME, READONLYFLAG|DIMSCALEFLAG},             /*NAME*/
    {NC_ATT_REFERENCE_LIST, READONLYFLAG|DIMSCALEFLAG},   /*REFERENCE_LIST*/
    {NC_ATT_FORMAT, READONLYFLAG},                        /*_Format*/
    {ISNETCDF4ATT, READONLYFLAG|NAMEONLYFLAG},            /*_IsNetcdf4*/
    {NCPROPS, READONLYFLAG|NAMEONLYFLAG|MATERIALIZEDFLAG},/*_NCProperties*/
    {NC_ATT_COORDINATES, READONLYFLAG|DIMSCALEFLAG|MATERIALIZEDFLAG},/*_Netcdf4Coordinates*/
    {NC_DIMID_ATT_NAME, READONLYFLAG|DIMSCALEFLAG|MATERIALIZEDFLAG},/*_Netcdf4Dimid*/
    {SUPERBLOCKATT, READONLYFLAG|NAMEONLYFLAG},/*_SuperblockVersion*/
    {NC3_STRICT_ATT_NAME, READONLYFLAG|MATERIALIZEDFLAG},  /*_nc3_strict*/
PaulHuwe commented 5 years ago

Your description did make sense. Do you know where I can find good documentation on _Netcdf4Dimid?

edhartnett commented 5 years ago

That was it! ;-)

You can grep the code and see what you find. It will be documented there as well.

PaulHuwe commented 5 years ago

Okay - so long as it is only intra-library, and does not affect anything else, I can modify my tests to ignore it.

edhartnett commented 5 years ago

Probably this issue should be closed, since this is not a bug. ;-)