TRIQS / h5

A high-level C++ interface to the hdf5 library
https://triqs.github.io/h5
Other
3 stars 7 forks source link

HDF5Archive mangles (silently) all keys in python dictionaries to strings #17

Open HugoStrand opened 7 years ago

HugoStrand commented 7 years ago

I was surprised to find that pytriqs.archive.HDF5Archive silently converts all keys in python dictionaries to strings, which of course break things expecting other python objects. See below for a small example.

I would strongly recommend that conversion of dictionaries either are fully supported, or that one gets an error when trying to store a dictionary with non-string keys. Silent conversion is very bad practice.

Best, Hugo

Here is a small example:

from pytriqs.archive import HDFArchive

my_data = {(0, 1) : 1.2345 } # dict with tuple key

# dump to disk and read again to/from hdf5
filename = "data.h5"
with HDFArchive(filename, 'w') as data: data['my_data'] = my_data
with HDFArchive(filename, 'r') as data: my_data_ref = data['my_data']

for key, value in my_data.items(): print key, value, type(key), type(value)
for key, value in my_data_ref.items(): print key, value, type(key), type(value)

for key, value_ref in my_data_ref.items():
    assert( my_data.has_key(key) )
    value = my_data[key]
    assert( value == value_ref )

resulting in

(0, 1) 1.2345 <type 'tuple'> <type 'float'>
(0, 1) 1.2345 <type 'str'> <type 'float'>
Traceback (most recent call last):
  File "triqs_dict_h5_bug.py", line 15, in <module>
    assert( my_data.has_key(key) )
AssertionError
krivenko commented 7 years ago

Hi Hugo, This issue was discussed before, see https://github.com/TRIQS/triqs/issues/220.

HugoStrand commented 7 years ago

Dear Igor, Thanks for pointing this out.

Even so, I really think that HDFArchive should complain when the keys are not strings and throw an error.

In the current state one easily spends a couple of hours debugging, since the conversion is silent.

Best, H

krivenko commented 7 years ago

Even so, I really think that HDFArchive should complain when the keys are not strings and throw an error.

In the current state one easily spends a couple of hours debugging, since the conversion is silent.

I agree, there is an issue that should be addressed.

The actual problem is that either of the potential solutions has its downside. Attaching type information to the keys will make archive format more complex. On the other hand, throwing an exception will surely break some codes. For example, I can easily imagine people using integers as dictionary keys.

HugoStrand commented 7 years ago

Fair enough, so a deprecation warning printout would be the middle ground solution?

krivenko commented 7 years ago

Fair enough, so a deprecation warning printout would be the middle ground solution?

Yes, this is probably the most reasonable thing we can do at the moment.