Closed MRossol closed 1 year ago
There is a similar issue with H5PYD having trouble with metadata attribute values being strings. Using H5PY, HDF5 files with metadata attribute values containing strings were created and then sent to the HSDS with H5PYD. However, when the files containing metadata string values were retrieved from the HSDS with hsget, the metadata attributes with the string values were completely stripped off of the HDF5 file. The hsget left the dataset and metadata attributes with non-string values intact.
To fix the issue with string metadata attribute values being stripped off of an HDF5 file using hsget:
In the utillib.py file under h5pyd/_apps, change lines 277-278 from this:
srcarr = np.asarray(data, order='C', dtype=src_dt) tgtarr = copy_array(srcarr, ctx)
to this:
if isinstance(data, str): tgtarr = np.string_(data) else: srcarr = np.asarray(data, order='C', dtype=src_dt) tgtarr = copy_array(srcarr, ctx)
Somewhere along the line, the issue @MRossol reported has been fixed (with h5pyd version 0.10.3 or higher).
@jbhatch - I'm not sure if the issue you saw had the same root cause or not. If you are still seeing this could you open up a new issue with a repo case? I'll promise to respond with more alacrity this time. :)
array of byte strings to be loaded into h5pyd time_index = pd.date_range('2016-01-01 00:30:00', '2016-12-31 23:30:00', freq='h') time_index = np.array(time_index.astype(str), dtype='S20')
Loading using the data param in create_dataset with h5pyd.File('/home/mrossol/nsrdb_tmy.h5', 'w') as f: f.create_dataset('time_index', time_index.shape, dtype=time_index.dtype, data=time_index)
Produces the following error: /anaconda/lib/python3.6/json/encoder.py in default(self, o) 178 “”" 179 raise TypeError(“Object of type ‘%s’ is not JSON serializable” % --> 180 o.class.name) 181 182 def encode(self, o):
TypeError: Object of type ‘bytes’ is not JSON serializable
If you create the dataset and then load the array it works: with h5pyd.File('/home/mrossol/nsrdb_tmy.h5', 'w') as f: t_index = f.create_dataset('time_index', time_index.shape, dtype=time_index.dtype)
t_index[...] = time_index