gang5504 / snakeyaml

Automatically exported from code.google.com/p/snakeyaml
Apache License 2.0
0 stars 0 forks source link

ScannerException should not contain the invalid character it's forbidding - makes message hard to read #209

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Have a yaml file with tabs or other forbidden chars
2. See than an actual _tab_ is in the error message, e.g. "found character 
<tab> \t(TAB) that cannot start any token"
3. Same (worse) for backspace, vertical tab, etc

What is the expected output? What do you see instead?
I'd just rather have the escaped character ("chRepresentation") in the error 
message

What version of SnakeYAML are you using? On what Java version?
1.14

Here's a simple patch (to prod code and test)

Original issue reported on code.google.com by joseph.g...@gmail.com on 10 Apr 2015 at 7:52

Attachments:

GoogleCodeExporter commented 8 years ago
Can you please add tests to show how it affects 1) and 3) ?

Original comment by py4fun@gmail.com on 10 Apr 2015 at 9:37

GoogleCodeExporter commented 8 years ago
Hmmn yeah I'm not sure I should have written "steps" for this. Whether it's a 
tab or another unwanted control character the issue is the same. As far as I 
can tell \t is only special because we print "(TAB)" next to it. So sure, I 
could add a test for another such char. While I'm at it I could also argue that 
other chars could benefit from a textual description rather than just their 
escaped form. 

Original comment by joseph.g...@gmail.com on 10 Apr 2015 at 10:34

GoogleCodeExporter commented 8 years ago
>I could also argue that other chars could benefit from a textual description 
rather than just their escaped form. 
Well, we print the char AND its textual description
As far as I can see the proposal is:
Improve human readability: in case of a leading TAB, do not print it in the 
error message, its  textual description is enough.
Basically, you say leading TAB should not be printed but other use cases or 
characters are not affected.
It is a minor issue for me. Let us see if somebody else can say something to 
come to a conclusion.

Original comment by py4fun@gmail.com on 11 Apr 2015 at 5:12

GoogleCodeExporter commented 8 years ago
Ha, you're right. By reading the code quickly, I'd assumed that other special 
chars covered by org.yaml.snakeyaml.scanner.ScannerImpl#ESCAPE_REPLACEMENTS 
would get the same treatment (i.e print the char itself THEN its 
representation). After adding a quick test for \b or \f, it seems those are 
caught earlier in StreamReader.

Original comment by joseph.g...@gmail.com on 12 Apr 2015 at 3:35

GoogleCodeExporter commented 8 years ago
I looked at the code (it was written more than 5 years ago) and I see the 
better picture now.
Your patch is very incomplete. Let us do the same for all the escape characters.
We can:
1) Create yet another Map<String, String> from escape char to its human 
description.
2) fill this map together with ESCAPE_REPLACEMENTS
3) use this map when we need the representation in the error message. We can 
remove these lines and use the new map instead:
        String chRepresentation = String.valueOf(ch);
        for (Character s : ESCAPE_REPLACEMENTS.keySet()) {
            String v = ESCAPE_REPLACEMENTS.get(s);
            if (v.equals(chRepresentation)) {
                chRepresentation = "\\" + s;// ' ' -> '\t'
                break;
            }
        }
        if (ch == '\t')
            chRepresentation += "(TAB)";

4) add/change tests and we are done.

Feel free to contribute a patch.

Original comment by py4fun@gmail.com on 13 Apr 2015 at 12:43

GoogleCodeExporter commented 8 years ago
It's debatably incomplete ;) What more could it do ?
Do we want to keep the special case for TAB ? (i.e represent it with "\\t 
(TAB)", while all other characters are just represented with their 
backslash-escape? - if not then indeed a second map could easily be populated, 
or even a BidiMap)

Original comment by joseph.g...@gmail.com on 17 Apr 2015 at 9:59

GoogleCodeExporter commented 8 years ago
Well, what I mean is that we either do it completely and properly for all 
chars, or we do not do it at all (to prevent the situation when the issues are 
created for each and every special character)
Since the initial patch is only about a single character it will not be 
accepted (to avoid misunderstanding in the future)
BidiMap is not an option because it requires an external dependency (SnakeYAML 
has none at the moment)
If you wish to finish the issue - feel free to contribute.

Original comment by py4fun@gmail.com on 17 Apr 2015 at 10:53

GoogleCodeExporter commented 8 years ago
Sorry, but I don't see how the patch is about a single character (i.e if you 
mean it's only for tab). All it does is remove %s (ch) from the error message, 
so _any_ unexpected character won't be print in the error message.
That said, I did try writing a test for, e.g  yaml.load("  data:\r 1"); and 
indeed, a different exception is thrown from elsewhere. Is that what you mean ? 
(i.e fix those other places that throw exceptions that also have the special 
chars in their messages ?

Original comment by joseph.g...@gmail.com on 17 Apr 2015 at 12:57

GoogleCodeExporter commented 8 years ago
Your patch is taken:
https://code.google.com/p/snakeyaml/wiki/changes

Thank you.
P.S. The map with descriptions can be implemented later

Original comment by py4fun@gmail.com on 17 Apr 2015 at 1:41