Open cavokz opened 3 years ago
It looks like the inc between elements is wrong for integers.. Notice that in all cases we see the Go data [a, b, c] end up being written out as [a, 0, b]. My guess would be that int
width is not properly being handled. What happens when you change field d
to [3]int32
and [3]int64
?
Yes, looking further, Go int
is mapped to H5T_NATIVE_INT which is a C int
. In 32 bit architectures this will be correct, but not in 64 bit architectures. I think that probably the way forward here is to not support Go int
and uint
.
Now experimentally confirmed, using explicit integer widths fixes the problem. This also lead me to suspect that field a
is not safe and I have also confirmed that by assigning i + math.MaxInt32
to it in the loop. This gives the following
$ h5ls -d SDScompound.h5
ArrayOfStructures Dataset {10}
Data:
(0) {2147483647, 0, 1, [0,0,0], NULL}, {-2147483648, 1, 0.5, [1,0,2], NULL},
(2) {-2147483647, 4, 0.333333333333333, [2,0,4], NULL}, {-2147483646, 9, 0.25, [3,0,6], NULL},
(4) {-2147483645, 16, 0.2, [4,0,8], NULL}, {-2147483644, 25, 0.166666666666667, [5,0,10],
(5) NULL}, {-2147483643, 36, 0.142857142857143, [6,0,12], NULL},
(7) {-2147483642, 49, 0.125, [7,0,14], NULL},
(8) {-2147483641, 64, 0.111111111111111, [8,0,16], NULL}, {-2147483640, 81, 0.1, [9,0,18],
(9) NULL}
After a quick read of The Datatype Interface (H5T), I think it might not be a good idea to drop such native types. It's clear that anybody caring of the .h5 portability should avoid such implicitly sized types but even then, why h5ls and gonum/hdf5 should disagree if running on the same machine? There seems to be some assumption misalignment which IMHO is worth fixing.
C int
is almost always 32 bits on machines with 32 and 64 bit architectures (certainly for all the architectures that gc targets). Go int
is 32 bits wide for 32 bit architectures and 64 bits wide for 64 bit architectures. This means that unsafe.Sizeof(int(0)) != unsafe.Sizeof(C.int(0))
on 64 bit architectures.
The reason h5ls and gonum/hdf5 and don't agree is because h5ls is a C program and gonum/hdf5 is a Go program that interfaces to C with a mapping int
<=> C.int
.
The underlying assumption is that there is a direct memory modelling between C and Go which does not hold for Go's int
and uint
types in all cases. Because of this, those types really should not be supported. We could conditionally make Go int
map to the correctly sized C type, but that would break people who move files from one arch to another. The simplest and best answer to this problem is to drop support for variable sized Go types, int
and uint
.
I've my doubts that any .h5 file with int/uint implicit sizes can be moved across architectures in a reliable way, therefore I would not care too much of this case. Anyway, it's your call and I surely don't see anything wrong in staying on the safe side and not supporting native Go int/uint types.
Are you going to enforce this on both read and write cases? Any possibility of manually overriding such limitation?
I don't have any use case at the moment but I'd like to recognize .h5 data files that cannot be safely read in Go.
What are you trying to do?
I'm examining the test-go-cpxcmpd code and the generated SDScompound.h5 file so to understand how things work.
What did you do?
I run the example:
What did you expect to happen?
What actually happened?
Note how the arrays of integers are completely wrong, ex. [1,0,2] instead of [1,2,3]. h5dump is consistent with h5ls but also h5py (the Python bindings) is. Hence my suspects on gonum/hdf5.
What version of Go, Gonum, Gonum/netlib and libhdf5 are you using?
Go: 1.16 gonum/hdf5.git: 496fefe91614e0bf15638ddc9da44edbae12b4dc
Does this issue reproduce with the current master?
Yes.
I locally removed the string from the dataset so to work around #12.
These are the local changes: