KenKundert / nestedtext

Human readable and writable data interchange format
https://nestedtext.org
MIT License
362 stars 13 forks source link

Expectation on encoding -> decoding of `None` #31

Closed erikw closed 2 years ago

erikw commented 2 years ago

Hello,

I'm on a good way to make a Ruby implementation of NestedText (erikw/nestedtext-ruby, all official decode tests are passing!) and I'm currently writing unit tests for various edge-case inputs. Using the Python implementation, I'm unsure what the expectation is on encoding python's None (and ruby's nil in my case). I see that None is treated in different ways.

Using this base-program and just changing the obj = ... line in the different examples:

   import nestedtext as nt

    obj = ...

    dumped = nt.dumps(obj)
    print("dumped:")
    print(repr(dumped))

    loaded = nt.loads(dumped)
    print("loaded:")
    print(repr(loaded))

Just None

    obj =  None

gives the output:

dumped:
''
loaded:
{}

▶️ None encodes to empty string, but is decoded back as empty inline dict

None in list

    obj =  [None]

gives the output:

dumped:
'-'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "__main__.py", line 82, in main
    test_dump()
  File "__main__.py", line 73, in test_dump
    loaded = nt.loads(dumped)
  File "[...]/nestedtext.py", line 1088, in loads
    loader = NestedTextLoader(lines, top, source, on_dup, keymap)
  File "[...]/nestedtext.py", line 765, in __init__
    report('content must start with key or brace ({{).', lines.get_next())
  File "[...]/nestedtext.py", line 258, in report
    raise NestedTextError(template=message, *args, **kwargs)
  File "<string>", line 0
nestedtext.NestedTextError

▶️ None encodes to empty string, but can't be decoded back to Python

None as dict value

    obj = {"key": None}

gives the output:

dumped:
'key:'
loaded:
{'key': ''}

▶️ None encodes to empty string, but is decoded back to empty string and not None

None as dict key

    obj = {None: "value"}

gives the output:

dumped:
'None: value'
loaded:
{'None': 'value'}

▶️ None encodes to the string "None", and decodes back to the string "None"


Thus, None is treated different in all cases above. The encoding from Python is of course not a part of the specification for the NT data format, but it would still be nice to know the rules for None, or that None is always treated the same in all cases :).

Is the output above the expected? I suspect that in the case with None

I'm happy to hear your thoughts!

KenKundert commented 2 years ago

An important thing to know is that loads takes an argument top that determines both the expected and the allowed form of the top-level object. This was done because people often just assume that the top level is a dictionary when they write their code, and their code would crash if they got an empty file or something other than a dictionary. To mitigate that, we could have restricted the format to only allow dictionaries at the top level, but we did not want to give up that flexibility. So we opted for a somewhat more confusing approach where the developer explicitly specifies the type of the top-level object.

Also, None is not officially supported as a data type in NestedText documents. My implementation of dumps converts it to an empty string by default, but if you pass default="strict" as an argument it raises an error. However, there is one exception. None is a valid type for the top-level object, and it signifies an empty NT document. Thus dumps(None) produces "". When reading an empty document None is returned if top is any. If top is not any the None is converted to an empty object of the expected type.

When dumping, I overlooked None passed as a key. Currently it is converted to the string "None", but as you say, that seems inconsistent and so I will have to rethink that behavior. I will probably convert it to a empty string to be consistent, as you suggest.

Just None:

You are seeing the expected behavior. None when dumped creates an empty NT document, which loads as an empty dictionary because by default top is dict.

None in list

You are seeing the expected behavior. In this case the error from loads results not from the None, but because the top-level object in the NT document is a list. By default, loads expects a dictionary.

None as dict value

You are seeing the expected behavior. By default None is converted to an empty string when passed as a value.

None as dict key

As I mentioned above, this one surprised me. I think the None should probably be represented by an empty string when passed as a key.

Test Cases

Here is some Python code that tests these various case. Perhaps it will be helpful to you:

import nestedtext as nt
from inform import error

cases = [
    # given,         top,   dumped,       loaded
    ( None,          dict,  '',           {}             ),
    ( None,          list,  '',           []             ),
    ( None,          str,   '',           ""             ),
    ( None,          any,   '',           None           ),
    ( [None],        list,  '-',          ['']           ),
    ( [None],        any,   '-',          ['']           ),
    ( {'key':None},  dict,  'key:',       {'key':''}     ),
    ( {None:'val'},  dict,  'None: val',  {'None':'val'} ),  # ← wrong?
]

for given, top, dump_expected, load_expected in cases:
    dumped = nt.dumps(given)
    if dumped != dump_expected:
        error(f"dumps({given}) gives {dumped}, expected {dump_expected}")
    loaded = nt.loads(dumped, top)
    if loaded != load_expected:
        error(f"loads({given}, {top}) gives {loaded}, expected {load_expected}")

# some special cases
try:
    nt.dumps([None], default='strict')
    error("expected exception on dumps([None]).")
except nt.NestedTextError:
    pass

try:
    nt.loads('-')
    error("expected exception on loads('-').")
except nt.NestedTextError:
    pass

A Ruby implementation, that is great news!

erikw commented 2 years ago

Thank you for the explanation!

I'll leave the issue open regarding None as a key.

KenKundert commented 2 years ago

I have decided to convert None in keys to the empty string to make keys consistent with values.. I have implemented this change in my local version and it my local tests. I am not yet committing those changes to github until I get comfortable with some new features I have implemented, but I wanted to let you know the direction I was going so you can do the same in your version.

erikw commented 2 years ago

Thank you for the heads-up. I will do the same in nestedtext-ruby :).

KenKundert commented 2 years ago

Okay, I finally updated the GitHub version with this change.