HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
129 stars 53 forks source link

Return `dtype` in GET datasets and GET attribute requests #84

Closed loichuder closed 3 years ago

loichuder commented 3 years ago

For our use, it would be more handy to parse the NumPy/h5py dtype rather than the HDF5 type.

Would it be possible to return it in the response of the GET Dataset and GET Attribute requests ?

jreadey commented 3 years ago

Since that's specific to Python and Numpy, I wouldn't think it make sense to add to the REST API. The code here: https://github.com/HDFGroup/h5pyd/blob/master/h5pyd/_hl/h5type.py can convert the type json to a dtype. Alternatively, you can just use h5pyd to call into HSDS.

Do either of these two approaches work for you?

loichuder commented 3 years ago

In fact, we already have a working code that parses the HDF5 type into a more convenient structure in h5web, that is close to what you link. And I don't think we can call h5pyd directly, as we are making requests from a browser.

Seeing https://github.com/HDFGroup/h5pyd/blob/master/h5pyd/_hl/h5type.py#L250, I have the impression that h5pyd is creating the HDF5 type from the dtype meaning that the full chain looks like: dtype converted to HDF5 type by h5pyd, converted back to something similar to dtype in h5web. In this case, having access directly to the dtype would largely simplify things and make the parsing more consistent.

Do you store the dtype in HSDS ? If yes, adding it to the response would be a minimal effort and would not break exisiting apps as people can still use the HDF5 type.

I concur that it is Python/NumPy specific but it embraces a lot of users and usecases...

jreadey commented 3 years ago

Hmm, I'm not sure what you mean by dtype since you can't use numpy from h5web. Are you looking for a numpy-like string representation of the type? (e.g. "float32" vs "H5T_IEEE_F32LE" ?

In HSDS the type json gets converted into a dtype for internal operations (.e.g selecting from a chunk), but is stored using the same json schema that you see in the request. It's specified here: https://hdf5-json.readthedocs.io/en/latest/bnf/datatype.html.

loichuder commented 3 years ago

Hmm, I'm not sure what you mean by dtype since you can't use numpy from h5web. Are you looking for a numpy-like string representation of the type? (e.g. "float32" vs "H5T_IEEE_F32LE" ?

Yes, things like >f4 or <i2.

In HSDS the type json gets converted into a dtype for internal operations (.e.g selecting from a chunk), but is stored using the same json schema that you see in the request. It's specified here: https://hdf5-json.readthedocs.io/en/latest/bnf/datatype.html.

All right. I thought that the dtype was stored in HSDS and that it could be easily included in the request. As this is not the case, I understand why it would not make sense to include it.