Closed opeongo closed 5 years ago
Merging #494 into merge-fork-beanshell2 will decrease coverage by
0.24%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## merge-fork-beanshell2 #494 +/- ##
===========================================================
- Coverage 70.22% 69.97% -0.25%
- Complexity 2624 2633 +9
===========================================================
Files 103 105 +2
Lines 8632 8917 +285
Branches 1688 1757 +69
===========================================================
+ Hits 6062 6240 +178
- Misses 2192 2254 +62
- Partials 378 423 +45
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
src/main/java/bsh/ReflectError.java | 33.33% <0%> (-33.34%) |
1% <0%> (-1%) |
|
src/main/java/bsh/DelayedEvalBshMethod.java | 62.68% <0%> (-25.32%) |
17% <0%> (-1%) |
|
src/main/java/bsh/Reflect.java | 70.27% <0%> (-9.44%) |
144% <0%> (-43%) |
|
src/main/java/bsh/BSHPrimarySuffix.java | 77.02% <0%> (-4.8%) |
56% <0%> (-1%) |
|
src/main/java/bsh/BlockNameSpace.java | 68.18% <0%> (-4.55%) |
8% <0%> (-2%) |
|
src/main/java/bsh/BSHArrayInitializer.java | 94.05% <0%> (-3.97%) |
45% <0%> (-2%) |
|
src/main/java/bsh/BshArray.java | 94.91% <0%> (-3.39%) |
74% <0%> (-3%) |
|
src/main/java/bsh/commands/dir.java | 58.49% <0%> (-1.89%) |
4% <0%> (ø) |
|
src/main/java/bsh/LHS.java | 65.4% <0%> (-1.47%) |
35% <0%> (-1%) |
|
src/main/java/bsh/Types.java | 80.68% <0%> (-1.46%) |
135% <0%> (+6%) |
|
... and 19 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 812bd9b...d0328b2. Read the comment docs.
Awesome! TYVM @opeongo
I need to get a go to windows environment configured. There are a few open windows issues that also need some TLC.
I also saw some unicode issues when last I tried to build on windows, hoping this patch beanshell/beanshell@5a9a685b61081b6a125bf24e7598ea0388b24be5 resolves that too.
I had a look at the windows issues. Most of them seem to be non-bugs, con't reproduce, or issues that have already bee fixed but not closed out. Would you like me to go through and categorize them?
Would you like me to go through and categorize them? @opeongo Yes please!! Even if you just report your findings, would be awesome thanx.
As for the line endings, we already carry System.getProperty("line.separator")
as bsh.Interpreter.Console.systemLineSeparator
I'm wondering if we shouldn't fix this somewhere else, in other words not in the tests. For example if I had a script using "\n"
as these tests do, and I run it under windows should we not let the script succeed with the correct line endings for windows?
What do you think?
Well, this is a bit curious. I went to have a look at where the \
After poking around it appears that git performed automatic line ending conversion when I cloned your repository. In other words my test script automatically got changed to be subtly different from your test script, and this broke the test.
Knowing this, I don't think that my suggested patch is correct, because it depends on 'git' changing the test script. Git (and other tools like it) are the culprit here. What if the repository was checked out without using autocrlf? I suspect that this will be an issue that other people will stumble over.
I guess one option would be to change the Parser so that it converts line endings to system independent form when it reads the string in (e.g. for Windows change \
input.replace(System.getProperty("line.separator"),"\n");
inside of BSHLiteral#stringSetup. But I don't think this is correct either.
I don't think that there is a perfect solution to this problem, because this kind of automagic would thwart people from deliberately forming the line endings that they wanted, and even so there would likely be odd situations where scripts would break mysteriously.
My preference would be to not modify the input at all, but caveat emptor that the line endings in your script file will be the ones that are copied verbatim in to the long string. This is not a bug, and fixing it would cause more confusion.
As far as the test failing, the test could be changed to test alternate line endings and succeed if either one passed.
Hmmm I don't think I quite follow with the git autocrlf
and how that relates to this patch.
// Long strings
- out="\n";
+ out= System.getProperty("line.separator");
for (x : 5)
- out += x + "\n";
+ out += x + System.getProperty("line.separator");
Surely git had nothing to do with this... well I hope at least.
Just stumbled on [System.lineSeparator()](https://docs.oracle.com/javase/10/docs/api/java/lang/System.html#lineSeparator()) since 1.7... now I know that too tyvm.
Here I will explain the reasoning why git caused the test to fail on my system, and what could be done about it.
This is a long standing and well known issue for cross development between unix-like and windows systems.
I assume that you prepared the strings.bsh test script on a unix-like system. The line endings in the file suggest so. If on your unix-like system you open the file in a binary/hex editor (not a text editor) you will see that the byte codes for this section are
aout = """
0
1
2
3
4
5
""";
are composed of the byte sequence
61 6F 75 74 20 3D 20 22 22 22 0A 30 0A 31 0A 32 0A 33 0A 34 0A 35 0A 22 22 22 3B 0A
I am working on a Windows system. When I clone the repository I get the same packed deltas as you, but when I check out a branch to a working copy git by default will translate line endings in text files from 'canonical' format (\n) to Windows format (\r\n). Now on my Windows/git/autocrlf system that same section of code will have the bytes:
61 6F 75 74 20 3D 20 22 22 22 0D 0A 30 0D 0A 31 0D 0A 32 0D 0A 33 0D 0A 34 0D 0A 35 0D 0A 22 22 22 3B 0D 0A
Note that this sequence is 8 bytes longer, because there are 8 lines and an \r code (0d) has been added to each line ending.
The reason for doing this is because some (most?) text editors on Windows will insert \r\n line endings for new lines, and when save the file you will either get all lines endings with \r\n, or perhaps just the lines that you inserted with \r\n and the rest unchanged (the behaviour depends on the editor, there is no standard). This is the hellish legacy of Bill Gates. Git by default does the automatic conversion from canonical to system dependent on checkout and system dependent to canonical on checkin because usually it is the right thing to do. If it didn't then there might be a lot of thrashing between line ending styles each time the file is modified on a different system type, and this would make the change log unusable.
Getting back to the test failure, when checked out from git/autocrlf on a Windows system the test script will actually be different than the test script on a unix-like system: they will have different byte sequences and have different lengths.
When the parser tokenizes a long string it will read all bytes in between the opening and closing """. For the test in question a unix-like system would see 13 bytes, for a Windows/git/autocrlf system 20 bytes. When the test constructs the comparison string it is manually using a single \n for a line ending, and making a sequence that is 13 bytes long. This compares exactly to the sequence read from the unix-like system, but not to the Windows/git/autocrlf sequence.
There are a two options I can see
Is it a bug? I am not sure. The long string is capturing everything in between the delimiters. Perhaps the system dependent line endings are important for a particular use case: who can say for sure. What if the long string had explicit line endings:
aout="""
this\\r\\ncould\\r\\n\\happen\\r\\n
""";
If we wanted to allow people full control to make whatever long strings they wanted then we should just leave it as is. If this is the case then change the test so that the error goes away.
The other option is to sanitize the input, just like git-autocrlf: when the parser tokenizes a long string convert all of the \r\n sequences to \n. Sounds easy. This auto-magic would make the test failure go away.
But should this be done on Windows only where these mixups can occur, or should it be done on automatically on all platforms? And what about cases where different line ending sequences are required parts of the application data? Automatically sanitize the inputs may end up doing the wrong thing in some cases.
Actually I can think of a third option:
Nah... that is getting too complicated for too little benefit.
After thinking about it my vote would go for the second option: automatically sanitize the inputs by changing all \r\n to \n. What is the common case for long strings anyway? Surely not to do exotic things with control characters. If there was a use case to embed different line endings this could be done using a more traditional means (StringBuilder) or with String.replace().
I vote for option 1 "Ignore this feature, it is not a bug" the problem comes from changing the lite endings which is not something that should ever occur under normal circumstances. If anyone managed to change the line endings they should know how to change it back too.
The other issue which should be considered is what if you intended to have \r\n
and we now go and remove it. Whomever caused the problem is the one responsible to fix it.
I dug around a bit more and I found JEP 327 "Raw String Literals".
It looks like this JEP moving forward and will soon be in Java. This clarifies how the official Java will handle raw string literals. Here are a couple of examples of how this will work
aout = `
0
1
2
3
5
`;
a = `this is a "raw string
literal". note that \ characters
get no special treatment`;
b = ``here is how to embed a ` single backtick``;
c = ```here is how to embed a `` double backtick```;
tl;dr
Note that they have taken a stand to normalize line endings. This effectively solves the issue that we have been discussing here.
This JEP also touches on the issues that we have been discussing in #495.
I think a review of BeanShell handles string and character literals is required in light of this JEP.
Agreed... investigate JEP 327
This patch fixes a couple of tests that were failing in a Window environment. It replaces the hardcoded path and line separator characters with system properties.