asdf-format / asdf

ASDF (Advanced Scientific Data Format) is a next generation interchange format for scientific data
http://asdf.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
511 stars 56 forks source link

tuple keys not handled #642

Open bblais opened 5 years ago

bblais commented 5 years ago

Smallest program which reproduces the error (asdf from conda 2.3.1):

import asdf

mydata={'hello':'there',
        (6,5):'bad key',
        'that':6,
       }
# Make the tree structure, and create a AsdfFile from it.
tree = {'hello':'there',
        'data':mydata,
       }
ff = asdf.AsdfFile(tree)  # saves, but error on load below
ff.write_to("test.asdf")

ff = asdf.AsdfFile(mydata)  # gives TypeError: unhashable type: 'list'
ff.write_to("test2.asdf")

with asdf.open('test.asdf') as af: # gives ConstructorError : while constructing a mapping
    print(af.tree)

Is this a known issue? thanks!

drdavella commented 5 years ago

Hi @bblais, thanks for the bug report. I am able to reproduce this and will investigate further.

This looks like it may actually be a limitation of the YAML implementation we use, but I need to dig deeper.

drdavella commented 5 years ago

This does in fact appear to be a limitation of pyyaml. Consider the following example:

import yaml

mydata = { 'hello':'there', (6,5):'bad key', 'that':6 }
s = yaml.safe_dump(mydata)
yaml.safe_load(s)

The result is the same error as in your example:

~/miniconda3/envs/asdfdev/lib/python3.6/site-packages/yaml/constructor.py in construct_mapping(self, node, deep)
    126             if not isinstance(key, collections.Hashable):
    127                 raise ConstructorError("while constructing a mapping", node.start_mark,
--> 128                         "found unhashable key", key_node.start_mark)
    129             value = self.construct_object(value_node, deep=deep)
    130             mapping[key] = value

ConstructorError: while constructing a mapping
  in "<unicode string>", line 1, column 1:
    hello: there
    ^
found unhashable key
  in "<unicode string>", line 2, column 3:
    ? [6, 5]
      ^

There doesn't seem to be anything in the YAML spec that would prevent it from being used this way, so this may actually be a bug in the pyyaml implementation.

drdavella commented 5 years ago

It's worth noting that ruamel.yaml appears to handle this case correctly. We have an issue for exploring the use of that package vs pyyaml (#391), so this would be a motivating use case.

jdavies-st commented 5 years ago

Generally speaking, aren't keys supposed by hashable? Certainly python tuples are hashable because they are immutable, so it is expected that on the way out (via writing) this works. But on read, it seems there is no way to know whether an array in a YAML file will be used as a list (mutable) or tuple (immutable), so no way of knowning if it is hashable or not from the YAML.

drdavella commented 5 years ago

@jdavies-st the YAML standard has no concept of what is hashable and what is not. And the standard seems to say that anything that is representable in unicode can be used as a key in a mapping, including other YAML objects. So there's no intrinsic issue with doing this as far as YAML itself is concerned.

What ruamel.yaml appears to do is to store the tuple as a YAML sequence (i.e. list) in order to use it as a key. When loading the file, it converts the sequence key back into a tuple since a Python list would not have been hashable in this way. This seems pretty reasonable to me. The fact that pyyaml does not do this seems to be a bug.

drdavella commented 5 years ago

Just tested this using the new release of pyyaml (version 5.1) and this still appears to be broken.