Unidata / netcdf-cxx4

Official GitHub repository for netCDF-C++ libraries and utilities.
Other
124 stars 49 forks source link

NetCDF-CXX4 Fails a Unit Test #30

Closed citibeth closed 7 months ago

citibeth commented 8 years ago

The unit test cxx4_test_type is failing, I believe on the type ncInt64. I've build things here with HDF5 enabled. As you can see below in the examples, this NetCDF had no problem writing nc4 files.

In real life (outside of unit tests), I'm not able to create any int64 variables with netcdf-cxx4. (I AM able to do so via ncgen, or via the Python interface).

Attached are build logs for netcdf and netcdf-cxx4.

netcdf-build.txt netcdf-cxx4-build.txt

Help... any ideas?

Thank you, -- Elizabeth

[me@ankeli cxx4]$ ./cxx4_test_type 
Opening file "firstFile.cdf" with NcFile::replace
Testing addGroup("groupName")                                -----------   passed
Testing getName()                                        unknown error
NetCDF: Not a valid data type or _FillValue type mismatch
file: /home/me/tmp/netcdf-cxx4/cxx4/ncType.cpp  line:97[me@ankeli cxx4]$ 
[me@ankeli cxx4]$ ../examples/
CMakeFiles/                    examples_sfc_pres_temp_rd      examples_simple_xy_wr
examples_pres_temp_4D_rd       examples_sfc_pres_temp_wr      examples_simple_xy_wr_formats
examples_pres_temp_4D_wr       examples_simple_xy_rd          
[me@ankeli cxx4]$ ../examples/examples_simple_xy_wr_formats 
*** SUCCESS creating nc4 file
*** SUCCESS creating nc4classic file
*** SUCCESS creating classic file
*** SUCCESS creating classic64 file
[me@ankeli cxx4]$ git info
== Remote URL: origin   https://github.com/Unidata/netcdf-cxx4.git (fetch)
origin  https://github.com/Unidata/netcdf-cxx4.git (push)

== Local Branches:
* master

== Configuration (.git/config)
[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
[remote "origin"]
    url = https://github.com/Unidata/netcdf-cxx4.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
    remote = origin
    merge = refs/heads/master
    rebase = true

== Most Recent Commit
commit f42eb9dbbcde1e1e232129f053442657f84cf390
Merge: 365ab9c 0bf071d
Author: Ward Fisher <wfisher@ucar.edu>
Date:   Fri Mar 4 16:00:56 2016 -0700

    Merge branch 'master' into doxygen

Type 'git log' for more commits, or 'git show' for full commit details.

[me@ankeli cxx4]$ ldd ./cxx4_test_type | grep hdf5
    libhdf5_hl.so.10 => /home/me/spack2/opt/spack/linux-x86_64/gcc-4.9.3/hdf5-1.8.16-3dh2iibcu4ahjzgujcruz6b7agezsba5/lib/libhdf5_hl.so.10 (0x00007f74a3a63000)
    libhdf5.so.10 => /home/me/spack2/opt/spack/linux-x86_64/gcc-4.9.3/hdf5-1.8.16-3dh2iibcu4ahjzgujcruz6b7agezsba5/lib/libhdf5.so.10 (0x00007f74a3555000)
citibeth commented 8 years ago

I've traced this to the NetCDF package. Consider the code::

groupId = 4;
myId = NC_INT64;   // 10
charName[0] = '\0';
  int err = nc_inq_type(groupId,myId,charName,sizep);
printf("err = %d charName=%s\n", err, charName);

This produces error = -45 (unknown NetCDF type). It works if myID=NC_INT.

Here's my current software stack, I'm going to try different versions of NetCDF or HDF5:

  netcdf-cxx4@local%gcc@4.9.3=linux-x86_64
      ^autoconf@2.69%gcc@4.9.3=linux-x86_64
      ^netcdf@4.4.0%gcc@4.9.3~hdf4+mpi=linux-x86_64
          ^curl@7.47.1%gcc@4.9.3=linux-x86_64
              ^openssl@system%gcc@4.9.3=linux-x86_64
              ^zlib@1.2.8%gcc@4.9.3=linux-x86_64
          ^hdf5@1.8.16%gcc@4.9.3~cxx~debug+fortran+mpi+shared~szip~threadsafe=linux-x86_64
              ^openmpi@1.10.2%gcc@4.9.3~psm~tm~verbs=linux-x86_64
                  ^hwloc@1.11.2%gcc@4.9.3=linux-x86_64
                      ^libpciaccess@0.13.4%gcc@4.9.3=linux-x86_64
                          ^libtool@2.4.6%gcc@4.9.3=linux-x86_64
WardF commented 8 years ago

This is a bit peculiar, the test on my end clearly fails but returns a success code anyways. I will investigate this and see if I can figure out what is going on.

WardF commented 8 years ago

I've opened an issue over at the C library and am investigating this now.

citibeth commented 8 years ago

Ward,

Does this mean that you were able to replicate the failure with your build? I want to rule out that this is due to something I might be doing wrong in the build process.

Thank you, -- Elizabeth

On Mon, Mar 28, 2016 at 12:41 PM, Ward Fisher notifications@github.com wrote:

This is a bit peculiar, the test on my end clearly fails but returns a success code anyways. I will investigate this and see if I can figure out what is going on.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/Unidata/netcdf-cxx4/issues/30#issuecomment-202478613

WardF commented 8 years ago

Hi Elizabeth, yes, this error is in the library and not at all related to anything you are doing wrong, so don't worry about that. I appreciate very much your work figuring out that this is in the C library, it saved me time and gave me a good starting point. On the C library, I'm working now in a branch gh230, where I've added a test which duplicates this error. Through this I should be able to diagnose the root cause of the problem.

WardF commented 8 years ago

My mistake, the branch for netcdf-c is gh240.

citibeth commented 8 years ago

I also made a little test, but it hasn't turned up anything. The code:

#include <stdio.h>
#include <netcdf.h>

void test(int ncid)
{
    for (int myId=1; myId <= 12; ++myId) {
        char charName[NC_MAX_NAME+1];
        charName[0] = '\0';
        size_t *sizep=NULL;
        int groupId = 0;
        int err = nc_inq_type(ncid,myId,charName,sizep);

        printf("%s: typeId=%d, err=%d\n", charName, myId, err);
    }
}

int main(int argc, char **argv)
{
    int ncid;

    printf("--------------- With NC_NETCDF4\n");
    printf("nc_create %d\n", nc_create("sample1.nc", NC_NETCDF4 |
NC_CLOBBER, &ncid));
    printf("ncid = %d\n", ncid);
    test(ncid);

    printf("--------------- Without NC_NETCDF4\n");
    printf("nc_create %d\n", nc_create("sample2.nc", NC_CLOBBER, &ncid));
    printf("ncid = %d\n", ncid);
    test(ncid);

    return 0;
}

The output:

[rpfische@ankeli nctest]$ ./nctest
--------------- With NC_NETCDF4
nc_create 0
ncid = 65536
byte: typeId=1, err=0
char: typeId=2, err=0
short: typeId=3, err=0
int: typeId=4, err=0
float: typeId=5, err=0
double: typeId=6, err=0
ubyte: typeId=7, err=0
ushort: typeId=8, err=0
uint: typeId=9, err=0
int64: typeId=10, err=0
uint64: typeId=11, err=0
string: typeId=12, err=0
--------------- Without NC_NETCDF4
nc_create 0
ncid = 131072
byte: typeId=1, err=0
char: typeId=2, err=0
short: typeId=3, err=0
int: typeId=4, err=0
float: typeId=5, err=0
double: typeId=6, err=0
: typeId=7, err=-45
: typeId=8, err=-45
: typeId=9, err=-45
: typeId=10, err=-45
: typeId=11, err=-45
: typeId=12, err=-45

On Mon, Mar 28, 2016 at 2:54 PM, Ward Fisher notifications@github.com wrote:

My mistake, the branch for netcdf-c is gh240.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/Unidata/netcdf-cxx4/issues/30#issuecomment-202527971

citibeth commented 8 years ago

I looked at the test on branch gh240. I would expect it to fail on NetCDF4-specific data types (eg NC_INT64) because the test does not include NC_NETCDF4 when calling nc_create().

My best guess at this point is the problem lies in netcdf-cxx4, somewhere it is not using NC_NETCDF4 when it should.

-- Elizabeth

On Mon, Mar 28, 2016 at 2:54 PM, Ward Fisher notifications@github.com wrote:

My mistake, the branch for netcdf-c is gh240.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/Unidata/netcdf-cxx4/issues/30#issuecomment-202527971

WardF commented 8 years ago

I've made a little bit of progress, and it looks like one of two things is going on. Either we had an oversight when we implemented CDF5, so that data types that are now supported are throwing an error when they shouldn't, or we are querying a file which doesn't support 64-bit int, e.g. a netCDF3 file. I need to poke around more to see what the expected behavior is, but I have tracked down the issue into NC3_inq_type.

I've just run another quick test using NC_CDF5 when creating the file in my test, and the test passes. There is clearly some disconnect between the C++ tests, the documentation and the expected behavior that I still need to wrap my head around and fix. At the very least, I need to figure out if this test ever worked; we automate our tests but this test was never actually reporting the fact that it was failing.

The test you inserted above, where you pass in the ncid; do you see an issue if you pass in either an ncid generated by a call to nc_open() where you've specifed either a type NC_CDF5 or NC_NETCDF4?

The test I've created can be viewed at https://github.com/Unidata/netcdf-c/blob/gh240/nc_test/tst_inq_type.c, although I'm playing around with it locally.

WardF commented 8 years ago

@citibeth I agree; I'm looking more closely at the C++ test now to try to determine where things are going off the rail.

citibeth commented 8 years ago

I extended my test, now it tries stuff with the C++ API as well:

int main(int argc, char **argv)
{
    int ncid;

    printf("--------------- With NC_NETCDF4\n");
    printf("nc_create %d\n", nc_create("sample1.nc", NC_NETCDF4 | NC_CLOBBER, &ncid));
    printf("ncid = %d\n", ncid);
    test_c(ncid);

    printf("--------------- Without NC_NETCDF4\n");
    printf("nc_create %d\n", nc_create("sample2.nc", NC_CLOBBER, &ncid));
    printf("ncid = %d\n", ncid);
    test_c(ncid);

    printf("--------------- With file opened by netcdf-cxx4\n");
    NcFile nc3("sample3.nc", NcFile::replace, NcFile::nc4);
    test_c(nc3.getId());

    printf("--------------- Creating a variable with C++ API\n");
//    NcFile nc4("sample4.nc", NcFile::replace);
//    NcFile nc4("sample4.nc", NcFile::replace, NcFile::nc4);
    nc3.addVar("var", ncInt);
    printf("ncInt succeeded\n");

    nc3.addVar("var", ncInt64);
    printf("ncInt64 succeeded\n");

    return 0;
}

The output:

--------------- With file opened by netcdf-cxx4
byte: typeId=1, err=0
char: typeId=2, err=0
short: typeId=3, err=0
int: typeId=4, err=0
float: typeId=5, err=0
double: typeId=6, err=0
ubyte: typeId=7, err=0
ushort: typeId=8, err=0
uint: typeId=9, err=0
int64: typeId=10, err=0
uint64: typeId=11, err=0
string: typeId=12, err=0
--------------- Creating a variable with C++ API
ncInt succeeded
terminate called after throwing an instance of 'netCDF::exceptions::NcBadType'
  what():  NetCDF: Not a valid data type or _FillValue type mismatch
file: /home/rpfische/spack2/var/spack/stage/netcdf-cxx4-ecdf914-lg4tpkrsk6boo7wroiubsccslkkzvtnw/netcdf-cxx4/cxx4/ncType.cpp  line:97
Aborted (core dumped)

So... the file was created correctly with the C++ API, but then somehow nc_inq_type() got confused when called form C++. Now supposing I create in the C++ API using NcFile::classic instead of NcFile::nc4. The output is the same, except for the following section:

--------------- With file opened by netcdf-cxx4
byte: typeId=1, err=0
char: typeId=2, err=0
short: typeId=3, err=0
int: typeId=4, err=0
float: typeId=5, err=0
double: typeId=6, err=0
: typeId=7, err=-45
: typeId=8, err=-45
: typeId=9, err=-45
: typeId=10, err=-45
: typeId=11, err=-45
: typeId=12, err=-45

Conclusion: the problem is not in the creation of the file. Creating files works as it should in C and C++ APIs.

citibeth commented 8 years ago

I have more clues. Consider the code:

    printf("addVar int-str\n");
    nc3.addVar("var1", "int", "dim");
    printf("addVar int64-str\n");
    nc3.addVar("var2", "int64", "dim");
    printf("addVar int-symbol\n");
    nc3.addVar("var3", ncInt, dim);
    printf("addVar int64-symbol\n");
    nc3.addVar("var4", ncInt64, dim);
    printf("Done with all!\n");

Its output is:

addVar int-str
addVar int64-str
addVar int-symbol
addVar int64-symbol
terminate called after throwing an instance of 'netCDF::exceptions::NcBadType'
  what():  NetCDF: Not a valid data type or _FillValue type mismatch
file: /home/rpfische/spack2/var/spack/stage/netcdf-cxx4-ecdf914-lg4tpkrsk6boo7wroiubsccslkkzvtnw/netcdf-cxx4/cxx4/ncType.cpp  line:97
Aborted (core dumped)

Something is different when calling addVar(ncInt64) vs. addVar("int64"). When using symbols, it seems to kick into NetCDF-3 restricted mode.

WardF commented 8 years ago

Ok, I think the issue is this: when NcType:getName is called, it is passed an invalid ncid, e.g. one not associated with an open file. Why it needs an open file to operate is a question I can't answer, but for now I'll trust that it was designed that way for a reason. Without a valid ncid value, there is a fallback function (in libnetcdf) that will work for all types valid in netCDF3, but not valid for anything else. NC_UBYTE et. al. are falling into this logic block, which is explicitely returning NC_EBADTYPE.

I need to think about how to best address this; whether the fix is to change how the C++ test works, or whether the fall-back function in libnetcdf should be modified to return information for a broader set of data types.

citibeth commented 8 years ago

Thanks, it sounds like you got it.

Why it needs an open file to operate is a question I can't answer

Because the correct result of nc_inq_type() changes depending on whether the file you opened allows NetCDF-4 types. Changing the function in libnetcdf might fix the C++ problem at hand, but would seem to be a step in the wrong direction. And could break someone else.

I need to think about how to best address this; whether the fix is to change how the C++ test works, or whether the fall-back function in libnetcdf should be modified to return information for a broader set of data types.

This is not just a problem with the tests, the library needs to be fixed. Right now, I cannot create variables of type ncInt64. I guess things work with "int64" string because it does a fresh lookup of types from the open NetCDF file. I will see if I can change my code to use that API and move on.

On the larger scale... I'm not sure what having classes (and then singleton instances ) for each netCDF data type buys us. I would have just kept them as a set of enumerations, as in the C API. Maybe the fix would be to do away with NcType and all its subclasses.

WardF commented 8 years ago

@citibeth Thanks, that is good to know. I'll have to write some broader tests for creating files and data types, so that these sorts of issues 1) show up on our testing dashboard and 2) offer a toe-hold for debugging.

While I'm well versed with C++, the current C++ interface was broadly contributed to us, as we were fairly resourced constrained (and have grown moreso). I agree with you that this definitely needs to be fixed so that you can create things with the "int64" string. I'm juggling a few things but will make this a priority. I'm surprised we haven't had any other reports about it, however.

citibeth commented 8 years ago

I am working around this bug, so there is no need to fix it in a rushed manner. The main place I call addVar() is:

netCDF::NcVar get_or_add_var(
    NcIO &ncio,
    std::string const &vname,
    netCDF::NcType const &nc_type,
    std::vector<netCDF::NcDim> const &dims)
{
    netCDF::NcVar ncvar;
    if (ncio.define) {
        ncvar = ncio.nc->getVar(vname);
        if (ncvar.isNull()) {
            ncvar = ncio.nc->addVar(vname, nc_type, dims);
...

I will change the signature of my function to:

netCDF::NcVar get_or_add_var(
    NcIO &ncio,
    std::string const &vname,
    std::string nc_type,
    std::vector<netCDF::NcDim> const &dims)

I will then call the all-string version of NcFile::addVar(). To do so, I will call getName() on each dimension in dims(), which I expect to work just fine (because the dims were created with respect to a real NcFile).

Meanwhile... I will change all calls to get_or_add_var() to use string constants instead of NcType instances. Or, I might just define ncInt="int", ncInt64="int64", etc. and not have to really change my code (other than changing from netCDF::ncInt to mypackge::ncInt).

I think that will do the job for now. Please tell me if you see any pitfalls in the plan. I appreciate the time you've put into this issue today.

citibeth commented 8 years ago

Actually... if netcdf is going to be changed, I think the RIGHT thing to do is to require a valid ID be provided to nc_inq_type(). Because as we can see here... without a valid ID, it's impossible to know whether a particular type is legal.

Then re-design the C++ API around that reality.

-- Elizabeth

On Mon, Mar 28, 2016 at 3:43 PM, Ward Fisher notifications@github.com wrote:

Ok, I think the issue is this: when NcType:getName is called, it is passed an invalid ncid, e.g. one not associated with an open file. Why it needs an open file to operate is a question I can't answer, but for now I'll trust that it was designed that way for a reason. Without a valid ncid value, there is a fallback function (in libnetcdf) that will work for all types valid in netCDF3, but not valid for anything else. NC_UBYTE et. al. are falling into this logic block, which is explicitely returning NC_EBADTYPE.

I need to think about how to best address this; whether the fix is to change how the C++ test works, or whether the fall-back function in libnetcdf should be modified to return information for a broader set of data types.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/Unidata/netcdf-cxx4/issues/30#issuecomment-202552891

DennisHeimbigner commented 8 years ago

My 2c worth. There are two issues here:

  1. what is the name and size of an atomictype.
  2. client wants to know what types are legal for a specific file

For number one, IMO, we should do this for all known atomic types depending on if the library has netcdf4 enabled. This would be independent of any specific file. This an easy change and is unlilkey to cause back compatibility problems.

2 is harder: in effect, we need to know the type of the actual file (CLASSIC, CDF5, NETCDF4, etc)

to make this determination., If we want to be pedantic about it, we also need to deal with pseudo- files like opendap which has a potentially different set of allowed types. One solution for #2 is to make it be what is computed if a valid ncid is given; this requires adding to the dispatch table to have each kind of format provide info about the types it supports.

WardF commented 8 years ago

For now, I'm going to fix this as follows. At the root of the issue is the fact that the C++ interface did not adhere to the C library standard, e.g. a valid ncid is required for some operations. The C++ functions related to the C operations now also require that a valid ncid has been aquired. When NcFile::open() opens a file, a global ncid is set, which will then be used for those functions requiring a valid ncid. I initially thought about refactoring the code so that the ncid would be passed explicitly as an argument, but that cascaded into a large refactor very quickly, so I abandoned it for now.

Note that ncType.getName() and ncType.getSize() don't actually change anything with the ncid they are passed; they just perform a check to validate the ncid and gate off the type of file it correspond to.

citibeth commented 8 years ago

Ward,

The use of a global ncid worries me. What if I do (pardon my loose C++):

NcFile nc1('file1.nc', nc4); NcFile nc2('file2.nc', classic); nc1.addVar('var1', ncInt64, dim);

I expect that the problem I'm encountering will re-appear then.

Maybe an alternative: hardcode the values of NcInt64::getName(), etc to constants based on what we know they SHOULD return: string NcInt64::getName() { return "int64"; }

(and similarly for all other types in NetCDF-4 but not NetCDF-3)

This would be roughly equivalent to the original idea to change the C API. But it keeps the change localized to the C++ API, which I this is safer.

-- Elizabeth

On Tue, Mar 29, 2016 at 2:57 PM, Ward Fisher notifications@github.com wrote:

For now, I'm going to fix this as follows. At the root of the issue is the fact that the C++ interface did not adhere to the C library standard, e.g. a valid ncid is required for some operations. The C++ functions related to the C operations now also require that a valid ncid has been aquired. When NcFile::open() opens a file, a global ncid is set, which will then be used for those functions requiring a valid ncid. I initially thought about refactoring the code so that the ncid would be passed explicitly as an argument, but that cascaded into a large refactor very quickly, so I abandoned it for now.

Note that ncType.getName() and ncType.getSize() don't actually change anything with the ncid they are passed; they just perform a check to validate the ncid and gate off the type of file it correspond to.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/Unidata/netcdf-cxx4/issues/30#issuecomment-203050331

WardF commented 8 years ago

@citibeth You are correct that there will remain circumstances in which this fix is insufficient, and I'm not happy about that; it may just be the best of a bad situation, at the moment. I originally went down the hardcoded getName() route, but then was stymied by the corresponding hardcoded getSize() that would be required.

@DennisHeimbigner just came in to my office to discuss this issue, and I think we are in agreement for a broader fix. Ultimately, the fix is going to be to loosen up the nc_inq_type() function's behavior so that it will work for all atomic+string data types regardless of whether or not a valid ncid was provided. That will obviate this change I'm making, at which point I'll remove it.

For the short term, at least, this will provide a fix that will work for what seems to be the majority of our users, and will let me get the C++ release out the door without having to wait on another C library release cycle.

WardF commented 8 years ago

I've opened #32 to ensure that this does not fall off the radar.