Test cases feedback after usage in implementation

After have working with the test cases to verify nestedtext-ruby I have collected a few points of feedback that I would like to share, for future improvements.

First of I really like this idea of having a shared test suite. I was able to cover most cases with my own unit testing, but the test suite helped me think of more cases that I had missed, or corner-cases! Furthermore I really found the show_test_cases.py very handy when going back and forth between different cases during development.

There were a few points where the reference Python implementation leaked in to the test cases themselves, which felt a bit awkward when using these test cases with a different language and implementation.

1. Non-printable character handling

Problem

It's a very good idea to use the standardized Unicode names, as this is something implementations in different languages can agree on. Yet there is room for improvement in the test cases:

https://github.com/KenKundert/nestedtext_tests/blob/57f736717864478b473371029b56ef53ce18ce3c/test_cases/string_multiline_08/load_err.json#L4
- The problem is here that the error message does not have the (explanation) for \t. The reference implementation uses the unicodedata.name and this library does not seem to do the job properly. Because \t does have a Unicode name and it's CHARACTER TABULATION. For my ruby implementation I use the library unicode_utils and it does give the correct name for \t when queried.
- Thus, the error message for this case should for consistency with the intention really be
```
"message": "invalid character in indentation: '\\t' (CHARACTER TABULATION)."
```
- To be able to pass the official tests, I needed to add an ugly kludge that skips the printable name for ASCII characters. See here
https://github.com/KenKundert/nestedtext_tests/blob/57f736717864478b473371029b56ef53ce18ce3c/test_cases/string_multiline_10/load_err.json#L4
- Here the proper name NO-BREAK SPACE is included. This is good.
- The issue that I've learned of is the part before, how the escaped version is printed (\\xa0 in this case). It turns out that for simple ASCII characters, languages seems to agree that a tab character is escaped as \t. However for the more unusual characters like non-printable space, there is a difference. For example Ruby will use the format \u00A0 instead.
- To be able to pass the official tests, I needed to add an ugly kludge to translate to the Python format. See here.

Solution Proposal

I suggest the following, to make it easier for more implementations to thrive in different languages

Skip the printable version all together. It works fine for ASCII like '\t`, but becomes a mess for random Unicode characters, to get the same representation.
Instead only, but always, use the Unicode unique name for the character. This is standardized and there are Unicode libraries for all relevant languages out there.

With this proposal, the two error messages above should instead look like:

"message": "invalid character in indentation: CHARACTER TABULATION."
"message": "invalid character in indentation: NO-BREAK SPACE."

Much cleaner and easier to implement correctly across programming languages.

2. Python class names leaks into test cases

Problem

See for example this test case: dict_21/dump_err.json On several places in the reference implementation (example) type(obj).__name__} is used in the error message.

To be able to pass the official tests, I needed to add a minimal translation from Ruby's Integer to Python's int. See here.

Solution Proposal

Here there's no luxury like in the Unicode case above that there is a standard that everyone can use. Maybe the case is that we just have to say that the Python names for the data types are the one we should use.

An alternative solution would be actually remove the type from the message itself and have this as a separate field in the exception. Then users of the official tests can check for the message unsupported type only, and handle the actual type e.g. as another attribute in the Exception instance or the like. This is actually similar to how I decided to implement errors in my Ruby implementation. Roughly my top-level Error class would look like

class Error < StandardError
  def initialize(message)
    @raw_message = message
    super(pretty_message(message))
end

This way, I will still have my error message pretty-printed like I want to stderr, while having access to the raw message that I use to verify against this official test suite.

3. Formatting in error message

Problem

The longer error-message for some indentation errors is wrapped in the official implementation at indentation_error().

To be able to pass the official tests, I needed to pull in a third party text wrapper. Luckily I could make this wrapper wrap the same way as Python module Inform that the reference implementation uses. See here.

Update I did a git-pull in my clone and saw that the text wrapping is updated in the reference implementation, since I wrote down this feedback some weeks ago. There is currently no test case in this official test suite that have wrapped message. But if I understand this line correctly, an error message could still end up being wrapped. As long as this does not make it in to any official tests it's OK!

Solution Proposal

I propose that formatting like text wrapping should not be done in the error messages. The error message should be a string of one line of text ideally. If there's a need for multiple lines, it should be a clear and easy format, and not automatic wrapping that is very likely to be done differently by different languages/libraries.

In my ruby implementation, I showed above how I separate the raw error message from the pretty formatted message that is actually showed to users. I think this is a reasonable solution to provide both 1) a testable error message 2) a readable error message for the end-users.

A lot of text here. Don't get me wrong - I think this test suited worked great for all other cases! Thanks for the effort to make it easy for us :)

It's just these three types of cases above that can be improved. If we can address these ones, we will make it a lot easier for further NesedText implementations to arise and thrive!

KenKundert / nestedtext_tests