Closed milancurcic closed 2 years ago
Could it be possible that the issue is with STRSIZE
being variable rather than fixed?
I need to add tests for this and fix it. Until recent updates, h5fortran character support in general was a bit narrow in scope. h5py discusses UTF8 and ASCII in HDF5 files, and h5py defaults to UTF8. In Fortran, some compilers including Intel oneAPI 2022 do not yet support UTF8 character. So I can't just make h5fortran default to UTF8.
Several days ago I added the ability to read variable length string datasets, but this might be missing from attributes, that should be an easy fix.
So in short there are two possible issues:
Recent updates to character in h5fortran (several days to today)
character(*), dimension(1, 1, ..., 1)
as well as character(*)
H5S_SCALAR
datasetscharacter(*), dimension(N,M,J,I,...)
datasetsh5%create(..., fill_value)
for real, integer, and characterI didn't make any changes to attributes for these feature updates, and obviously character is even more popular for attributes than datasets, so this is worthwhile.
Thanks, Michael. In case it's helpful, I found this thread and example: https://forum.hdfgroup.org/t/how-to-read-a-utf-8-string/6125/6
I made a bit of headway by adapting the example above. I couldn't find a subroutine in the API to query the length of the string (let me know if you know about it), so it currently hardcodes a possibly sufficiently large buffer length.
program p
implicit none
print *, get_h5_attribute_string('mnist_dense.h5', '.', 'model_config')
contains
function get_h5_attribute_string(filename, object_name, attribute_name) result(res)
use hdf5, only: H5F_ACC_RDONLY_F, HID_T, &
h5aget_type_f, h5aopen_by_name_f, h5aread_f, &
h5fclose_f, h5fopen_f
use iso_c_binding, only: c_char, c_f_pointer, c_loc, c_null_char, c_ptr
character(*), intent(in) :: filename
character(*), intent(in) :: object_name
character(*), intent(in) :: attribute_name
character(:), allocatable :: res
! Make sufficiently large to hold most attributes
integer, parameter :: BUFLEN = 10000
type(c_ptr) :: f_ptr
type(c_ptr), target :: buffer
character(len=BUFLEN, kind=c_char), pointer :: string => null()
integer(HID_T) :: fid, aid, atype
integer :: hdferr
! Open the file and get the type of the attribute
call h5fopen_f(filename, H5F_ACC_RDONLY_F, fid, hdferr)
call h5aopen_by_name_f(fid, object_name, attribute_name, aid, hdferr)
call h5aget_type_f(aid, atype, hdferr)
! Read the data
f_ptr = c_loc(buffer)
call h5aread_f(aid, atype, f_ptr, hdferr)
call c_f_pointer(buffer, string)
! Close the file
call h5fclose_f(fid, hdferr)
res = string(:index(string, c_null_char))
end function get_h5_attribute_string
end program p
Building and running the program on the h5 file attached above in this thread returns the expected output:
{"class_name": "Sequential", "config": {"name": "sequential", "layers": [{"class_name": "InputLayer", "config": {"batch_input_shape": [null, 784], "dtype": "float32", "sparse": false, "ragged": false, "name": "input_1"}}, {"class_name": "Dense", "config": {"name": "dense", "trainable": true, "dtype": "float32", "units": 30, "activation": "sigmoid", "use_bias": true, "kernel_initializer": {"class_name": "GlorotUniform", "config": {"seed": null}}, "bias_initializer": {"class_name": "Zeros", "config": {}}, "kernel_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}}, {"class_name": "Dense", "config": {"name": "dense_1", "trainable": true, "dtype": "float32", "units": 10, "activation": "softmax", "use_bias": true, "kernel_initializer": {"class_name": "GlorotUniform", "config": {"seed": null}}, "bias_initializer": {"class_name": "Zeros", "config": {}}, "kernel_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}}]}}
Note, this UTF8 read is done using the default character
Fortran kind.
I haven't examined the consequences of this vs. the UCS4 Fortran kind.
I note in both cases, len=4 character is required.
My test of this is trivial, so please reopen this if this doesn't work for you.
HDF5 n00b here. I'm able to write a simple HDF5 file with a global attribute and read the attribute back from it, e.g.:
I get the output that I expect.
Then, I'm trying to read a global attribute from a file output by Keras (attached). I use the same approach:
However, the output is not what I expect:
(and similar; it varies between runs).
In an attempt to understand why, I used
ncdump
andh5dump
to inspect the files. From the simpletest_file.h5
I created, I have:And for the Keras-generated HDF5 file:
Comparing the two
h5dump
outputs, I can see that the attribute types are different in terms ofSTRSIZE
(6 vs.H5T_VARIABLE
) andCSET
(H5T_CSET_ASCII
vsH5T_CSET_UTF8
).What do you think about this? It seems to me that the different encoding (ASCII vs UTF8) could be the culprit for may failed reading of the Keras file. Does h5fortran support this, and if yes, how should I do the reading?
Thanks!
Attachment (gzipped so GitHub lets me upload it): mnist_dense.tar.gz