Open erikw opened 2 years ago
I'd like to piggyback two concerns for the error messages and add them to this issue.
The messages are all in English. I'm an English speaking native, so this isn't inconvenient for me, but what if someone offered localized error messages? I think the error messages should not be strictly checked for exact wording but perhaps specific phrases, such as "CHARACTER TABULATION" as from above.
The messages use smart quotes, which isn't commonly seen in output by my language. This might be related to using Python as the primary coding language that's generating the errors, and Python does things a bit differently than most others. This one could also be solved at the same time as my earlier issue if the tests for the message were based on keywords, phrases, and implementation-independent output.
After have working with the test cases to verify nestedtext-ruby I have collected a few points of feedback that I would like to share, for future improvements.
First of I really like this idea of having a shared test suite. I was able to cover most cases with my own unit testing, but the test suite helped me think of more cases that I had missed, or corner-cases! Furthermore I really found the
show_test_cases.py
very handy when going back and forth between different cases during development.There were a few points where the reference Python implementation leaked in to the test cases themselves, which felt a bit awkward when using these test cases with a different language and implementation.
1. Non-printable character handling
Problem
It's a very good idea to use the standardized Unicode names, as this is something implementations in different languages can agree on. Yet there is room for improvement in the test cases:
(explanation)
for\t
. The reference implementation uses the unicodedata.name and this library does not seem to do the job properly. Because\t
does have a Unicode name and it'sCHARACTER TABULATION
. For my ruby implementation I use the library unicode_utils and it does give the correct name for\t
when queried.NO-BREAK SPACE
is included. This is good.\\xa0
in this case). It turns out that for simple ASCII characters, languages seems to agree that a tab character is escaped as\t
. However for the more unusual characters like non-printable space, there is a difference. For example Ruby will use the format\u00A0
instead.Solution Proposal
I suggest the following, to make it easier for more implementations to thrive in different languages
With this proposal, the two error messages above should instead look like:
Much cleaner and easier to implement correctly across programming languages.
2. Python class names leaks into test cases
Problem
See for example this test case: dict_21/dump_err.json On several places in the reference implementation (example)
type(obj).__name__}
is used in the error message.To be able to pass the official tests, I needed to add a minimal translation from Ruby's
Integer
to Python'sint
. See here.Solution Proposal
Here there's no luxury like in the Unicode case above that there is a standard that everyone can use. Maybe the case is that we just have to say that the Python names for the data types are the one we should use.
An alternative solution would be actually remove the type from the message itself and have this as a separate field in the exception. Then users of the official tests can check for the message
unsupported type
only, and handle the actual type e.g. as another attribute in the Exception instance or the like. This is actually similar to how I decided to implement errors in my Ruby implementation. Roughly my top-level Error class would look likeThis way, I will still have my error message pretty-printed like I want to stderr, while having access to the raw message that I use to verify against this official test suite.
3. Formatting in error message
Problem
The longer error-message for some indentation errors is wrapped in the official implementation at indentation_error().
To be able to pass the official tests, I needed to pull in a third party text wrapper. Luckily I could make this wrapper wrap the same way as Python module Inform that the reference implementation uses. See here.Update I did a git-pull in my clone and saw that the text wrapping is updated in the reference implementation, since I wrote down this feedback some weeks ago. There is currently no test case in this official test suite that have wrapped message. But if I understand this line correctly, an error message could still end up being wrapped. As long as this does not make it in to any official tests it's OK!
Solution Proposal
I propose that formatting like text wrapping should not be done in the error messages. The error message should be a string of one line of text ideally. If there's a need for multiple lines, it should be a clear and easy format, and not automatic wrapping that is very likely to be done differently by different languages/libraries.
In my ruby implementation, I showed above how I separate the raw error message from the pretty formatted message that is actually showed to users. I think this is a reasonable solution to provide both 1) a testable error message 2) a readable error message for the end-users.
A lot of text here. Don't get me wrong - I think this test suited worked great for all other cases! Thanks for the effort to make it easy for us :)
It's just these three types of cases above that can be improved. If we can address these ones, we will make it a lot easier for further NesedText implementations to arise and thrive!