asdf-format / asdf

ASDF (Advanced Scientific Data Format) is a next generation interchange format for scientific data
http://asdf.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
520 stars 57 forks source link

ASDF doesn't unpack dict tuple values as tuples #1588

Open nstarman opened 1 year ago

nstarman commented 1 year ago

If I save a dictionary with tuple values, the values are loaded as lists. Tuples are immutable among other useful properties. It would be great if data structures could losslessly round-trip though the ASDF format.

>>> af = asdf.AsdfFile()
>>> af["example"] = {"a": (1, 2)}
>>> af.write_to("test.asdf")
>>> af.close()

>>> af = asdf.open("test.asdf")
>>> af["example"]
{'a': [1, 2]}
braingram commented 1 year ago

Thanks for opening the issue.

It's possible to add support for tuples via the extension interface. Here's a small example that could be flushed out for your application.

import asdf

class CustomConverter(asdf.extension.Converter):
    types = [tuple]
    tags = ["http://example.com/tags/tuple-1.0.0"]

    def to_yaml_tree(self, obj, tag, ctx):
        return list(obj)

    def from_yaml_tree(self, node, tag, ctx):
        return tuple(node)

class CustomExtension:
    extension_uri = "http://example.com/extensions/test-1.0.0"
    converters = [CustomConverter()]
    tags = CustomConverter.tags

with asdf.config_context() as config:
    config.add_extension(CustomExtension())

    value = (1, 2, 3)
    fn = 'test.asdf'

    asdf.AsdfFile({'value': value}).write_to(fn)

    with asdf.open(fn) as af:
        rt_value = af['value']

    print("====== Comparison ======")
    print(f"Equality: {value == rt_value}")
    print(f"Type: {type(value) == type(rt_value)}")
    print("========================")

    with open(fn) as f:
        print(f.read())

Running this on my system (with the current asdf main but this should work with older version) produces:

====== Comparison ======
Equality: True
Type: True
========================
#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.0.0.dev339+ga0778f30.d20230712}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev339+ga0778f30.d20230712}
  - !core/extension_metadata-1.0.0 {extension_class: __main__.CustomExtension, extension_uri: 'http://example.com/extensions/test-1.0.0'}
value: !<http://example.com/tags/tuple-1.0.0> [1, 2, 3]
...

Note that the tuple is tagged with a custom tag. The need for this stems somewhat from the YAML standard but also from the ASDF standard.

The closest structure to a tuple in YAML is a sequence. By default sequences are loaded as lists (by pyyaml, the library asdf uses) and lists are written as sequences. To produce YAML that is closer to the standard, asdf builds off of SafeLoader and SafeDumper from pyyaml (see asdf.yamlutil for more details). The SafeDumper automatically converts tuples to sequences:

>>> yaml.dump((1, 2, 3), Dumper=yaml.SafeDumper)
'- 1\n- 2\n- 3\n'

This seems like a convenience for the user (rather than throwing a RepresenterError when a tuple is encountered). However as you've pointed out this means that tuples do not round trip (even when using pyyaml directly).

>>> yaml.load(yaml.dump((1, 2, 3), Dumper=yaml.SafeDumper), yaml.SafeLoader)
[1, 2, 3]

The asdf library could raise this (or some other) exception when a tuple is encountered or define a custom tag to allow mapping tuples to a tagged YAML sequence (like is done in the above example). As asdf is focused on supporting the standard (and leaves non-standard tags to extension) adding a custom tag would involve updating the ASDF standard to add a new tag for tuples/immutable sequences. This is already done for things like complex.

I'm not seeing any documentation for this behavior, perhaps it would fit in where the documentation describes the Data Model. I'll take a stab at adding a note about tuple handling.