test-go-cpxcmpd: output does not match with h5ls/h5dump/h5py

cavokz commented 3 years ago

What are you trying to do?

I'm examining the test-go-cpxcmpd code and the generated SDScompound.h5 file so to understand how things work.

What did you do?

I run the example:

$ go run cmd/test-go-cpxcmpd/main.go
:: data: [{0 0 1 [0 0 0]} {1 1 0.5 [1 2 3]} {2 4 0.3333333333333333 [2 4 6]} {3 9 0.25 [3 6 9]} {4 16 0.2 [4 8 12]} {5 25 0.16666666666666666 [5 10 15]} {6 36 0.14285714285714285 [6 12 18]} {7 49 0.125 [7 14 21]} {8 64 0.1111111111111111 [8 16 24]} {9 81 0.1 [9 18 27]}]
:: file [SDScompound.h5] created (id=72057594037927936)
:: dset (id=360287970189639680)
:: dset.Write...
:: dset.Write... [ok]
:: data: [{0 0 1 [0 0 0]} {1 1 0.5 [1 2 3]} {2 4 0.3333333333333333 [2 4 6]} {3 9 0.25 [3 6 9]} {4 16 0.2 [4 8 12]} {5 25 0.16666666666666666 [5 10 15]} {6 36 0.14285714285714285 [6 12 18]} {7 49 0.125 [7 14 21]} {8 64 0.1111111111111111 [8 16 24]} {9 81 0.1 [9 18 27]}]

What did you expect to happen?

$ h5ls -d SDScompound.h5
ArrayOfStructures        Dataset {10}
    Data:
         {0, 0, 1, [0,0,0]}, {1, 1, 0.5, [1,2,3]}, {2, 4, 0.333333333333333, [2,4,6]}, {3, 9, 0.25, [3,6,9]}, {4, 16, 0.2, [4,8,12]},
         {5, 25, 0.166666666666667, [5,10,15]}, {6, 36, 0.142857142857143, [6,12,18]}, {7, 49, 0.125, [7,14,21]}, {8, 64, 0.111111111111111, [8,16,24]},
         {9, 81, 0.1, [9,18,27]}

What actually happened?

$ h5ls -d SDScompound.h5
ArrayOfStructures        Dataset {10}
    Data:
         {0, 0, 1, [0,0,0]}, {1, 1, 0.5, [1,0,2]}, {2, 4, 0.333333333333333, [2,0,4]}, {3, 9, 0.25, [3,0,6]}, {4, 16, 0.2, [4,0,8]},
         {5, 25, 0.166666666666667, [5,0,10]}, {6, 36, 0.142857142857143, [6,0,12]}, {7, 49, 0.125, [7,0,14]}, {8, 64, 0.111111111111111, [8,0,16]},
         {9, 81, 0.1, [9,0,18]}

Note how the arrays of integers are completely wrong, ex. [1,0,2] instead of [1,2,3]. h5dump is consistent with h5ls but also h5py (the Python bindings) is. Hence my suspects on gonum/hdf5.

What version of Go, Gonum, Gonum/netlib and libhdf5 are you using?

Go: 1.16 gonum/hdf5.git: 496fefe91614e0bf15638ddc9da44edbae12b4dc

Does this issue reproduce with the current master?

Yes.

I locally removed the string from the dataset so to work around #12.

These are the local changes:

diff --git cmd/test-go-cpxcmpd/main.go cmd/test-go-cpxcmpd/main.go
index 7c80377..3c65cdf 100644
--- cmd/test-go-cpxcmpd/main.go
+++ cmd/test-go-cpxcmpd/main.go
@@ -25,7 +25,6 @@ type s1Type struct {
    b float32
    c float64
    d [3]int
-   e string
 }

 type s2Type struct {
@@ -48,7 +47,6 @@ func main() {
            b: float32(i * i),
            c: 1. / (float64(i) + 1),
            d: [...]int{i, i * 2, i * 3},
-           e: fmt.Sprintf("--%d--", i),
        }
        //s1[i].d = []float64{float64(i), float64(2*i), 3.*i}}
    }

kortschak commented 3 years ago

It looks like the inc between elements is wrong for integers.. Notice that in all cases we see the Go data [a, b, c] end up being written out as [a, 0, b]. My guess would be that int width is not properly being handled. What happens when you change field d to [3]int32 and [3]int64?

Yes, looking further, Go int is mapped to H5T_NATIVE_INT which is a C int. In 32 bit architectures this will be correct, but not in 64 bit architectures. I think that probably the way forward here is to not support Go int and uint.

Now experimentally confirmed, using explicit integer widths fixes the problem. This also lead me to suspect that field a is not safe and I have also confirmed that by assigning i + math.MaxInt32 to it in the loop. This gives the following

$ h5ls -d SDScompound.h5 
ArrayOfStructures        Dataset {10}
    Data:
        (0) {2147483647, 0, 1, [0,0,0], NULL}, {-2147483648, 1, 0.5, [1,0,2], NULL},
        (2) {-2147483647, 4, 0.333333333333333, [2,0,4], NULL}, {-2147483646, 9, 0.25, [3,0,6], NULL},
        (4) {-2147483645, 16, 0.2, [4,0,8], NULL}, {-2147483644, 25, 0.166666666666667, [5,0,10], 
        (5)  NULL}, {-2147483643, 36, 0.142857142857143, [6,0,12], NULL},
        (7) {-2147483642, 49, 0.125, [7,0,14], NULL},
        (8) {-2147483641, 64, 0.111111111111111, [8,0,16], NULL}, {-2147483640, 81, 0.1, [9,0,18], 
        (9)  NULL}

cavokz commented 3 years ago

After a quick read of The Datatype Interface (H5T), I think it might not be a good idea to drop such native types. It's clear that anybody caring of the .h5 portability should avoid such implicitly sized types but even then, why h5ls and gonum/hdf5 should disagree if running on the same machine? There seems to be some assumption misalignment which IMHO is worth fixing.

kortschak commented 3 years ago

C int is almost always 32 bits on machines with 32 and 64 bit architectures (certainly for all the architectures that gc targets). Go int is 32 bits wide for 32 bit architectures and 64 bits wide for 64 bit architectures. This means that unsafe.Sizeof(int(0)) != unsafe.Sizeof(C.int(0)) on 64 bit architectures.

The reason h5ls and gonum/hdf5 and don't agree is because h5ls is a C program and gonum/hdf5 is a Go program that interfaces to C with a mapping int <=> C.int.

The underlying assumption is that there is a direct memory modelling between C and Go which does not hold for Go's int and uint types in all cases. Because of this, those types really should not be supported. We could conditionally make Go int map to the correctly sized C type, but that would break people who move files from one arch to another. The simplest and best answer to this problem is to drop support for variable sized Go types, int and uint.

cavokz commented 3 years ago

I've my doubts that any .h5 file with int/uint implicit sizes can be moved across architectures in a reliable way, therefore I would not care too much of this case. Anyway, it's your call and I surely don't see anything wrong in staying on the safe side and not supporting native Go int/uint types.

Are you going to enforce this on both read and write cases? Any possibility of manually overriding such limitation?

I don't have any use case at the moment but I'd like to recognize .h5 data files that cannot be safely read in Go.

gonum / hdf5