KenKundert / nestedtext_tests

5 stars 2 forks source link

Test cases feedback after usage in implementation #8

Open erikw opened 2 years ago

erikw commented 2 years ago

After have working with the test cases to verify nestedtext-ruby I have collected a few points of feedback that I would like to share, for future improvements.

First of I really like this idea of having a shared test suite. I was able to cover most cases with my own unit testing, but the test suite helped me think of more cases that I had missed, or corner-cases! Furthermore I really found the show_test_cases.py very handy when going back and forth between different cases during development.

There were a few points where the reference Python implementation leaked in to the test cases themselves, which felt a bit awkward when using these test cases with a different language and implementation.

1. Non-printable character handling

Problem

It's a very good idea to use the standardized Unicode names, as this is something implementations in different languages can agree on. Yet there is room for improvement in the test cases:

Solution Proposal

I suggest the following, to make it easier for more implementations to thrive in different languages

With this proposal, the two error messages above should instead look like:

"message": "invalid character in indentation: CHARACTER TABULATION."
"message": "invalid character in indentation: NO-BREAK SPACE." 

Much cleaner and easier to implement correctly across programming languages.

2. Python class names leaks into test cases

Problem

See for example this test case: dict_21/dump_err.json On several places in the reference implementation (example) type(obj).__name__} is used in the error message.

To be able to pass the official tests, I needed to add a minimal translation from Ruby's Integer to Python's int. See here.

Solution Proposal

Here there's no luxury like in the Unicode case above that there is a standard that everyone can use. Maybe the case is that we just have to say that the Python names for the data types are the one we should use.

An alternative solution would be actually remove the type from the message itself and have this as a separate field in the exception. Then users of the official tests can check for the message unsupported type only, and handle the actual type e.g. as another attribute in the Exception instance or the like. This is actually similar to how I decided to implement errors in my Ruby implementation. Roughly my top-level Error class would look like

class Error < StandardError
  def initialize(message)
    @raw_message = message
    super(pretty_message(message))
end

This way, I will still have my error message pretty-printed like I want to stderr, while having access to the raw message that I use to verify against this official test suite.

3. Formatting in error message

Problem

The longer error-message for some indentation errors is wrapped in the official implementation at indentation_error().

To be able to pass the official tests, I needed to pull in a third party text wrapper. Luckily I could make this wrapper wrap the same way as Python module Inform that the reference implementation uses. See here.

Update I did a git-pull in my clone and saw that the text wrapping is updated in the reference implementation, since I wrote down this feedback some weeks ago. There is currently no test case in this official test suite that have wrapped message. But if I understand this line correctly, an error message could still end up being wrapped. As long as this does not make it in to any official tests it's OK!

Solution Proposal

I propose that formatting like text wrapping should not be done in the error messages. The error message should be a string of one line of text ideally. If there's a need for multiple lines, it should be a clear and easy format, and not automatic wrapping that is very likely to be done differently by different languages/libraries.

In my ruby implementation, I showed above how I separate the raw error message from the pretty formatted message that is actually showed to users. I think this is a reasonable solution to provide both 1) a testable error message 2) a readable error message for the end-users.


A lot of text here. Don't get me wrong - I think this test suited worked great for all other cases! Thanks for the effort to make it easy for us :)

It's just these three types of cases above that can be improved. If we can address these ones, we will make it a lot easier for further NesedText implementations to arise and thrive!

fidian commented 3 months ago

I'd like to piggyback two concerns for the error messages and add them to this issue.

  1. The messages are all in English. I'm an English speaking native, so this isn't inconvenient for me, but what if someone offered localized error messages? I think the error messages should not be strictly checked for exact wording but perhaps specific phrases, such as "CHARACTER TABULATION" as from above.

  2. The messages use smart quotes, which isn't commonly seen in output by my language. This might be related to using Python as the primary coding language that's generating the errors, and Python does things a bit differently than most others. This one could also be solved at the same time as my earlier issue if the tests for the message were based on keywords, phrases, and implementation-independent output.