geospace-code / h5fortran

Lightweight HDF5 polymorphic Fortran: h5write() h5read()
https://geospace-code.github.io/h5fortran
BSD 3-Clause "New" or "Revised" License
99 stars 23 forks source link

Problem reading string scalar value from hierarchy dataset #26

Closed gekowa closed 2 years ago

gekowa commented 2 years ago

I have an HDF5 file that structured like below:

sample.h5
|-- 1
    |-- a
|-- 2
    |-- a
|-- 3
    |-- a
...

where "/1/a", "/2/a", "/3/a" and so on contain string scalar values, like "MA", "blk"... no more than 3 chars.

When reading these values with h5fortran, it crashes at 3rd or 4th read. Here is the code:

program test_my
    use h5fortran, only: hdf5_file, h5write, h5read

    implicit none

    type(hdf5_file) :: rstFile
    character(len=5) :: buffer1

    call rstFile % open("/root/Desktop/h5fortran_example.h5", action='r')

    call rstFile % read('/1/a', buffer1)
    print *,buffer1
    call rstFile % read('/2/a', buffer1)
    call rstFile % read('/3/a', buffer1)
    call rstFile % read('/4/a', buffer1)    ! throws exception
    call rstFile % read('/5/a', buffer1)
    call rstFile % read('/6/a', buffer1)
end program

I'm using Intel oneAPI Fortran compiler version 2021.5, Ubuntu 20.04.

Attached the file I created for reproducing the issue. h5fortran_example.zip

And, here is the call stack snapshot.

image

Thank you!

scivision commented 2 years ago

Thank you. I think this HDF5 file was created by another program? I get a segfault even on the first read with GCC on Windows. This should be something h5fortran can handle, I will check it out. Thanks!

scivision commented 2 years ago

hello, could you try the latest code on "main" branch. I reworked how character read/write is done, this seems to have fixed the problem.

gekowa commented 2 years ago

Thank you, I'll try asap.

Yes, indeed, this HDF5 file was created with h5py.

scivision commented 2 years ago

I added a test case for reading H5T_VARIABLE length character. This should now work up to 10,000 characters

gekowa commented 2 years ago

The error message really helps. I realize that I define the dataset as string with length of 32 bytes and store a 2 char string inside. I naively thought using a character(len=5) variable would be OK, instead I should use character(len=32). Now the value can be read, but with a lot of trailing \0s.

test my

This is can be easily handled though.

Btw, I can also change the code that writes the HDF5 from:

f.create_dataset('/1/a', (), numpy.dtype('S32'), "MA")

to

f.create_dataset('/1/a', (), numpy.dtype('S2'), "MA")

to avoid the issue.

scivision commented 2 years ago

so it seems a better h5fortran-internal handling of null_fill would be useful too. I think that makes sense as for a general text string I think the file user's intent would be to ignore everything after the first \0

gekowa commented 2 years ago

Of course. The ideal solution is that h5fortran can recognize and auto remove trailing\0 chars, and allow using actual size of character variable to store the value.

For now, do you mind that I close the issue?

scivision commented 2 years ago

I added a test case as well as code for this. The Python code generates a file like the following, which h5fortran reads successfully.

GROUP "/" {
   DATASET "nullpad" {
      DATATYPE  H5T_STRING {
         STRSIZE 40;
         STRPAD H5T_STR_NULLPAD;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "Hello World!\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
      }
   }
   DATASET "variable" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "Hello World!"
      }
   }
}
}
gekowa commented 2 years ago

Awesome! 👍👍👍

scivision commented 2 years ago

fixed by v4.6.0