MITHaystack / digital_rf

Read, write, and interact with data in the Digital RF and Digital Metadata formats
Other
102 stars 31 forks source link

Writting lists of strings with DigitalMetadataWriter #16

Open jswoboda opened 4 years ago

jswoboda commented 4 years ago

I attempted to write a list of strings to digital metadata to keeps track of names of sub-channels. This led to the following error in h5py

h5py error TypeError: No conversion path for dtype: dtype('<U2')

Searching led me to this issue with h5py requiring to change the list using the following numpy command.

np.string_()

I don't know if there's a need to address this directly. I'm just putting this up to note it for now.

ryanvolz commented 4 years ago

I don't think np.string_() is what you want, since that will turn your whole list into one string, but maybe converting to an array explicitly using h5py's special string dtype:

np.asarray(['ch1', 'ch2', 'ch3'], dtype=h5py.string_dtype(encoding='utf-8'))

I'll note this part of the h5py docs that says it does not support numpy's U dtype: http://docs.h5py.org/en/latest/strings.html#what-about-numpy-s-u-type

I found the h5py.string_dtype docstring illuminating:

Make a numpy dtype for HDF5 strings encoding may be 'utf-8' or 'ascii'. length may be an integer for a fixed length string dtype, or None for variable length strings. String lengths for HDF5 are counted in bytes, not unicode code points. For variable length strings, the data should be passed as Python str objects (unicode in Python 2) if the encoding is 'utf-8', and bytes if it is 'ascii'. For fixed length strings, the data should be numpy fixed length bytes arrays, regardless of the encoding. Fixed length unicode data is not supported.

So basically, you have 3 options:

  1. Array of variable-length strings using the h5py.string_dtype(encoding='utf-8') dtype
  2. Array of variable-length bytes using the h5py.string_dtype(encoding='ascii') dtype
  3. Array of fixed length bytes using the np.string_ or h5py.string_dtype(length=N) dtype

I'm leaning toward this not being something that Digital Metadata handles explicitly since it's kinda intended to be a thin format wrapper to h5py. It definitely could use some documentation as a likely trouble spot though, whenever we have time to write some better documentation.