HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
128 stars 53 forks source link

Variable Length Strings #214

Closed ron-kuhn closed 1 year ago

ron-kuhn commented 1 year ago

Can someone explain why variable length strings are not supported? Is there a work-around? Is there plans to support in the future? Should I just avoid using variable length strings (i.e. performance reasons?)?

jreadey commented 1 year ago

They've been supported for ages. See: https://github.com/HDFGroup/hsds/blob/master/tests/integ/vlen_test.py for example. Do you have some code that you expected to work that doesn't?

ron-kuhn commented 1 year ago

Supported in HSDS; NOT supported in REST vol for HDF5. I added the issue to rest vol (https://github.com/HDFGroup/vol-rest/issues/13).

ron-kuhn commented 1 year ago

you can close it here

jreadey commented 1 year ago

Thanks for opening the issue in vol-rest.
Regarding performance, I haven't seen many benchmarks for HDF5 with variable length types but expect performance will to be slightly slower in HSDS compared with variable length types. For variable types there's an extra step on the client where the data has be be serialized, and then de-serialized server side.

If you know the maximum size of the datatype, one alternative would be to used a fixed size type with compression. Compressors do a good job of squishing the zero-bytes, so there won't be a lot of storage overhead.