Apollo3zehn / PureHDF

A pure .NET library that makes reading and writing of HDF5 files (groups, datasets, attributes, ...) very easy.
MIT License
47 stars 16 forks source link

unable to read attributes in generic way. #12

Closed LiorBanai closed 2 years ago

LiorBanai commented 2 years ago

Hi, I've got a request to read all attribute of a h5 files during iteration of it in my own library (https://github.com/LiorBanai/HDF5-CSharp/issues/163).

I was thinking about leveraging your library since it is better implementation but when I try to read the attributes I get the following exception: ""The fill value message is missing." image

and others such 'Non-negative number required. (Parameter 'count')'

image

or image

the file is hdf5_test.zip

is there a better way to read all attributes of a file?

Apollo3zehn commented 2 years ago

Thank you for the bug report, I will investigate it in Monday. It should not be too hard to solve it with the sample file available. Unfortunately the HDF5 specification is sometimes not perfect and I need to look into the C source code to understand how it has to be implemented. Luckily attributes are quite simple to read, so I should find the root cause quickly (I hope).

Apollo3zehn commented 2 years ago

What do you mean with "is there a better way"? I think with this lib it is quite easy to enumerate all attributes of a group or dataset and recursing through the hierarchy should also be easy. But it is quite hard to read them all and present them to the user in a usable way. What I mean is that either the user know the datatype and specifies a generic parameter for a single attribute and that attribute is returned nicely. Or the user wants all attributes but their data type can differ, so the return value for such a function can only be object[].

HDF5.NET does not have a function that "just reads" the content of an attribute and return an object. Instead you would need to check the data type of the returned attribute an then call the read function with a a proper generic parameter (or ReadString()).

LiorBanai commented 2 years ago

Thanks for the fast response. I just meant that I planned to open each group and access the Attributes property.

Since I got exceptions I assumed I did something wrong. 🙂

LiorBanai commented 2 years ago

Even return object is good for just getting them. Maybe name, type, value (object) per attribute

Apollo3zehn commented 2 years ago

I have found a bug while reading and old version of the "fill value message" (see here)

With that bug fixed it should be possible to read that attribute now. However, you also had two small bugs in your code. Please try the version below to read the string attribute:

using var root = H5File.OpenRead(filePath);
var attribute = root.Group("/arrays").Dataset("2D float array").Attribute("test");
var actual = attribute.ReadString();

To return name, type and value, you could maybe do the following:

public record AttributeData(string Name, H5DataType type, object Value);

return new AttributeData(attribute.Name, attribute.Type, actual);
Apollo3zehn commented 2 years ago

Sorry, closed too early. I need to know if it works for you :-)

Apollo3zehn commented 2 years ago

Ah, and I forgot to mention one thing:

When you read the string attribute, the return value is "test " (with 5 spaces). I think the reason is that the attribute has been created with NULL padding, but instead of \0 it actually is padded with 0x20, i.e. space characters. That is why my code is not trimming these characters away. HDFView does it somehow, but I think this is not correct.

image

LiorBanai commented 2 years ago

Thanks a lot. I'll test it and report back :)

LiorBanai commented 2 years ago

one note. can I do something like direct access to dataset?

var file = H5File.OpenRead(fileName);
file.Dataset(element.Name).Attributes

if so I still get errors: image

LiorBanai commented 2 years ago

string filename = Path.Combine(folder,"files", "testfile2.H5"); var root = H5File.OpenRead(filename); var attribute = root.Group("/").Dataset("A note").Attributes;

trying to read it directlyand getting "System.Exception: 'The fill value message is missing.'" :)

Apollo3zehn commented 2 years ago

Okay then there are two errors and I catched only one of them. I will check again tomorrow.

LiorBanai commented 2 years ago

Thanks :)

Apollo3zehn commented 2 years ago

I have found the issue. The first one was a problem with an old version of the "fill value message". The second one was a problem with the "old fill value message". This "old" message is optional and the "new" message is required according to the spec. And in your sample file the required "new fill value message" was not there which caused the "The fill value message is missing" error.

I think the spec is unclear here but as a solution the code creates now a default fill value message when there is none. Hopefully it works better now :-)

image

Apollo3zehn commented 2 years ago

Your code

var file = H5File.OpenRead(fileName);
file.Dataset(element.Name).Attributes

should work. The Dataset(...) accepts a "path" argument, so you do not need to access the group first.

LiorBanai commented 2 years ago

I think we are getting there :) almost able to read the entire file. but the ReadString has some issue with specific entry: image

this is the problematic: image

LiorBanai commented 2 years ago

and some other error related to images :) image

image

Apollo3zehn commented 2 years ago

The first error comes because you are trying to read an 16-bit unsigned integer as a string. If you want to read all kinds of attributes, you would need to do it as shown below. Unfortunately it is not possible for every attribute to read it without knowning its detailed data type.

var attributes = root.Dataset("/arrays/Vdata table: PerBlockMetadataCommon").Attributes.Select(attribute =>
{
    return (object)((attribute.Type.Class, attribute.Type.Size) switch
    {
        (H5DataTypeClass.FloatingPoint, 4) => attribute.Read<float>(),
        (H5DataTypeClass.FloatingPoint, 8) => attribute.Read<double>(),
        (H5DataTypeClass.FixedPoint, 1) when !attribute.Type.FixedPoint.IsSigned => attribute.Read<byte>(),
        (H5DataTypeClass.FixedPoint, 1) when attribute.Type.FixedPoint.IsSigned => attribute.Read<sbyte>(),
        (H5DataTypeClass.FixedPoint, 2) when !attribute.Type.FixedPoint.IsSigned => attribute.Read<ushort>(),
        (H5DataTypeClass.FixedPoint, 2) when attribute.Type.FixedPoint.IsSigned => attribute.Read<short>(),
        (H5DataTypeClass.FixedPoint, 4) when !attribute.Type.FixedPoint.IsSigned => attribute.Read<uint>(),
        (H5DataTypeClass.FixedPoint, 4) when attribute.Type.FixedPoint.IsSigned => attribute.Read<int>(),
        (H5DataTypeClass.FixedPoint, 8) when !attribute.Type.FixedPoint.IsSigned => attribute.Read<ulong>(),
        (H5DataTypeClass.FixedPoint, 8) when attribute.Type.FixedPoint.IsSigned => attribute.Read<long>(),
        (H5DataTypeClass.VariableLength, _) => attribute.ReadString(),
        (H5DataTypeClass.String, _) => attribute.ReadString(),
        // Other types might currently be a bit difficult to read automatically.
        // However, in future it will be possible to also read unknown structs.
        //
        // If you need to support more exotic HDF types, you could use reflection
        // to get the full data type information and not just what is currently
        // being exposed in the public API. E.g. some types like "Array" or "Enum"
        // have base type information (e.g. an Enum value could be based on a uint16
        // value) which would allow you to read these kind of attributes, too.
        _ => throw new NotSupportedException($"The type class {attribute.Type.Class} is currently not supported.")
    });
});

I´ll investigate your other issue now.

LiorBanai commented 2 years ago

thanks for the support :)

LiorBanai commented 2 years ago

one more issue :) image

LiorBanai commented 2 years ago

I hope I don't bombard you with too many issues :)

Apollo3zehn commented 2 years ago

Regarding your latest issue:

I have had a look into your code and I think that the method ReadFileStructure does not properly detect "Commited Data Types" (there may be datasets, groups, attributes and commited data types in a HDF5 file). I know this specific code comes from me, so in the end its my fault :-/ I do not know how to find out if the current object is a commited data type.

Maybe the code

else
{
    objectType = H5O.type_t.UNKNOWN;
    elementType = Hdf5ElementType.Group;
}

should be changed to

else
{
    objectType = H5O.type_t.NAMED_DATATYPE;
    elementType = Hdf5ElementType.CommitedDataType;
}

And then in your AddAttributes method you can add another case where you call file.CommitedDataType(element.Name).Attributes

Regarding the previous issue with the missing image:

That one was a bit tougher.

I am not 100% sure but I think I have not a bug in my code but the file is malformed.

Apollo3zehn commented 2 years ago

I hope I don't bombard you with too many issues :)

No problem, it is good to have a complex & old H5 file for testing because up to now I did not have any. It improves the quality of HDF5.NET, which is my desire :-)

LiorBanai commented 2 years ago

Regarding your latest issue:

I have had a look into your code and I think that the method ReadFileStructure does not properly detect "Commited Data Types" (there may be datasets, groups, attributes and commited data types in a HDF5 file). I know this specific code comes from me, so in the end its my fault :-/ I do not know how to find out if the current object is a commited data type.

Maybe the code

else
{
    objectType = H5O.type_t.UNKNOWN;
    elementType = Hdf5ElementType.Group;
}

should be changed to

else
{
    objectType = H5O.type_t.NAMED_DATATYPE;
    elementType = Hdf5ElementType.CommitedDataType;
}

And then in your AddAttributes method you can add another case where you call file.CommitedDataType(element.Name).Attributes

Thanks for going the extra mile and looking at my code :) I'll expend it as you suggested.

Regarding the previous issue with the missing image:

That one was a bit tougher.

* There are three ways a group can list its content in an HDF5 file and every object (group, dataset, commited data type) has a header with header messages.

  1. Symbol Table Message (old)
  2. Link Info Message (new)
  3. A special cache entry which points to the Symbol Table Message (this cache entry comes from elsewhere)

* In your file, there is a Link Info Message containing the image `landcover.umd.199906.jpg`

* And there is a cache entry containing all the other datasets in group `images`

* According to the spec, a group can only be of on type, either having a `Symbol Table Message` OR have a `Link Info Message`

* According a certain test in the HDF5 C source code, when there is a cache entry, there must also be a Symbol Table Message

* So what we have here is a double violation: There is no Symbol Table Message although we have a cache entry. And if that was corrected, we would have both, a Symbol Table Message AND a Link Info Message, which is prohibited.

* My guess is that the test file was created with an old HDF5 library and later, with a newer lib the image `landcover.umd.199906.jpg` was added. The library had a bug and removed the Symbol Table Message but did not properly remove the cache entry.

I am not 100% sure but I think I have not a bug in my code but the file is malformed.

I think malformed files are edge cases that should not be expected to be parsed correctly 100%. Even having "Unknown" value or such will be enough to indicate to the end user that something needed to be looked at on their side. I think attributes are less frequently used as store of data (only as meta data?)

LiorBanai commented 2 years ago

and then in your AddAttributes method you can add another case where you call file.CommitedDataType(element.Name).Attributes

I don't have the Attributes property on the CommitedDataType Member:

image

Apollo3zehn commented 2 years ago

My bad, I will add that property on Monday. I don't know how I could miss that.

LiorBanai commented 2 years ago

Thanks :) BTW with all your other changes, fixes and suggestions I'm able to ready almost the entire file :)

Apollo3zehn commented 2 years ago

Sorry I won't make it today but tomorrow morning, I promise :-)

LiorBanai commented 2 years ago

no pressure (and thanks again for all your hard work) :)

Apollo3zehn commented 2 years ago

Done :-) I hope everything works for you now as expected.

LiorBanai commented 2 years ago

@Apollo3zehn yeah I can confirm that it works great :) thanks again

Apollo3zehn commented 2 years ago

Very good, I will then close the issue for now :-)