Closed PaulHuwe closed 5 years ago
Thanks, will take a look at this!
Unfortunately, I'm getting a '404 file not found' when I try to access this file. Can you confirm that it is still present? I can try to debug this with a different file, but would like to duplicate the issue as closely as possible.
If I click the link, I get a file. We do require a (free) account to download from NASA now. If you would prefer, I could put it here as an attachment, but it is large (~125MB).
The _Netcdf4Dimid attribute is used when needed to retain dimension ordering that would otherwise be lost.
Since it is a hidden attribute, why is it a problem for you?
Because I am trying to preserve the original file as close as possible, and the _Netcdf4Dimid exists in the original, but not some of our subsets (it is inconsistent). The inconsistency also affects our ability to test our services.
If you create the file with netcdf or any of the netcdf utilities, you don't have to worry about whether it is present or not.
If you use netcdf for testing (i.e. to open the file and check it) then once again it should not matter.
In fact, the only way to see it is to use netcdf-c to create the file, and then use hdf5 to read the file. Is that what you are doing?
Yes, we are using hdf5 tools in addition to netcdf tools to read the file.
OK, well if I recall correctly, the _Netcdf4Dimid attribute is used whenever the creation order of the dimension scale dataset is not the same as the dimid order. This occurs in the following situation: nc_def_dim "dim_a" nc_def_dim "dim_b" nc_def_var "dim_a"
So if I define two dimensions dim_a and dim_b they will be created in that order and have dimids 0 and 1.
Then I do an nc_def_var for dim_a (because I am going to write coordinate data). This causes the dimension scale dataset for dim_a to be re-created, which changes the creation order, and would then break the dimids. So a _NetcdfDimid attribute will be attached to the dimension scale datasets, to indicate the correct dimids.
Does that make sense?
I would suggest you write your hdf5-based tests so that they ignore the netcdf-4 hidden attributes. They are (from libhdf5/hdf5file.c):
static const NC_reservedatt NC_reserved[NRESERVED] = {
{NC_ATT_CLASS, READONLYFLAG|DIMSCALEFLAG}, /*CLASS*/
{NC_ATT_DIMENSION_LIST, READONLYFLAG|DIMSCALEFLAG}, /*DIMENSION_LIST*/
{NC_ATT_NAME, READONLYFLAG|DIMSCALEFLAG}, /*NAME*/
{NC_ATT_REFERENCE_LIST, READONLYFLAG|DIMSCALEFLAG}, /*REFERENCE_LIST*/
{NC_ATT_FORMAT, READONLYFLAG}, /*_Format*/
{ISNETCDF4ATT, READONLYFLAG|NAMEONLYFLAG}, /*_IsNetcdf4*/
{NCPROPS, READONLYFLAG|NAMEONLYFLAG|MATERIALIZEDFLAG},/*_NCProperties*/
{NC_ATT_COORDINATES, READONLYFLAG|DIMSCALEFLAG|MATERIALIZEDFLAG},/*_Netcdf4Coordinates*/
{NC_DIMID_ATT_NAME, READONLYFLAG|DIMSCALEFLAG|MATERIALIZEDFLAG},/*_Netcdf4Dimid*/
{SUPERBLOCKATT, READONLYFLAG|NAMEONLYFLAG},/*_SuperblockVersion*/
{NC3_STRICT_ATT_NAME, READONLYFLAG|MATERIALIZEDFLAG}, /*_nc3_strict*/
Your description did make sense. Do you know where I can find good documentation on _Netcdf4Dimid?
That was it! ;-)
You can grep the code and see what you find. It will be documented there as well.
Okay - so long as it is only intra-library, and does not affect anything else, I can modify my tests to ignore it.
Probably this issue should be closed, since this is not a bug. ;-)
Environment Information
Centos6 64 Bit Anaconda Python 2.7 Library version: netCDF4 (1.5.1.2)
Summary of Issue
The internal attribute _Netcdf4Dimid is inconsistently showing up in output files for our tools, and it appears to be due to variable write order (with data).
Steps to reproduce the behavior
Using the following input file: https://tropomi.gesdisc.eosdis.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__AER_AI.1/2019/025/S5P_OFFL_L2__AER_AI_20190125T105654_20190125T123824_06655_01_010202_20190131T102415.nc
I try to read the following two variables in, and write a subset of them to an output file: METADATA/QA_STATISTICS/aerosol_index_340_380_histogram_axis METADATA/QA_STATISTICS/aerosol_index_340_380_histogram
METADATA/QA_STATISTICS/aerosol_index_340_380_histogram_axis is a Dimension scale variable.
If I read & write aerosol_index_340_380_histogram_axis first & assign it data, and next read & write aerosol_index_340_380_histogram & assign it data (scratch_6.py [txt]), then aerosol_index_340_380_histogram does NOT get an _Netcdf4Dimid attribute.
However, if I do the same steps, but with the variables reversed - that is read & write aerosol_index_340_380_histogram first & assign it data, and next read & write aerosol_index_340_380_histogram_axis & assign it data (scratch_5.py [txt]) - then aerosol_index_340_380_histogram DOES get an _Netcdf4Dimid attribute.
This behavior is quite confusing, unexpected, and not consistent with the other dimension scale variables in the file (except aerosol_index_354_388_histogram_axis which behaves the same). Is this a bug, or is there a design reason this is occurring that I need to code towards?
Code: scratch_5.txt scratch_6.txt