LuaUnit is a popular unit-testing framework for Lua, with an interface typical of xUnit libraries (Python unittest, Junit, NUnit, ...). It supports several output formats (Text, TAP, JUnit, ...) to be used directly or work with Continuous Integration platforms (Jenkins, Maven, ...).
Other
565
stars
136
forks
source link
Strings are not properly escaped in JUnit XML reports #163
local lu = require('luaunit')
function test_str_compare_null_byte()
local actual = "q\000\000\002w\000"
local expected = "q\000\000\002w\000\000"
lu.assertEquals(actual, expected)
end
os.exit( lu.LuaUnit.run() )
$ lua test_reproducer.lua --output junit --name report | cat --show-nonprinting
# XML output to report.xml
# Started on 07/25/24 16:34:54
# Starting test: test_str_compare_null_byte
# Failure: test_reproducer.lua:7: expected: "q^@^@^Bw^@^@"
# actual: "q^@^@^Bw^@"
# Ran 1 tests in 0.002 seconds, 0 successes, 1 failure
The problem is that the JUnit XML reports will also (like the console output) contain these characters unescaped, resulting in invalid XML that the XML parsers I've tried refuse to read:
Console output
```console
$ ruby parse_xml.rb
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:96:in `rescue in parse': # (REXML::ParseException)
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:140:in `block in check'
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `each'
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `check'
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/attribute.rb:175:in `element='
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/element.rb:2384:in `[]='
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:36:in `block in parse'
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `each'
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `parse'
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build'
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize'
parse_xml.rb:3:in `new'
parse_xml.rb:3:in `'
...
Illegal character "\u0000" in raw string "test_reproducer.lua:7: expected: "q\u0000\u0000\u0002w\u0000\u0000"\nactual: "q\u0000\u0000\u0002w\u0000""
Line: 10
Position: 581
Last 80 unconsumed characters:
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:21:in `parse'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize'
from parse_xml.rb:3:in `new'
from parse_xml.rb:3:in `'
C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:140:in `block in check': Illegal character "\u0000" in raw string "test_reproducer.lua:7: expected: "q\u0000\u0000\u0002w\u0000\u0000"\nactual: "q\u0000\u0000\u0002w\u0000"" (RuntimeError)
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `each'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `check'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/attribute.rb:175:in `element='
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/element.rb:2384:in `[]='
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:36:in `block in parse'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `each'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `parse'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build'
from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize'
from parse_xml.rb:3:in `new'
from parse_xml.rb:3:in `'
```
from defusedxml.ElementTree import parse
et = parse('report.xml')
Console output
```
$ python parse_xml.py
Traceback (most recent call last):
File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1706, in feed
self.parser.Parse(data, False)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 9, column 67
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\temp\ks-experiments\luaunit-bug\parse_xml.py", line 2, in
et = parse('report.xml')
^^^^^^^^^^^^^^^^^^^
File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\site-packages\defusedxml\common.py", line 100, in parse
return _parse(source, parser)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1204, in parse
tree.parse(source, parser)
File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 572, in parse
parser.feed(data)
File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1708, in feed
self._raiseerror(v)
File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1615, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 9, column 67
```
Note that the assertion failure is escaped ('q\x00\x00\x02w\x00' != 'q\x00\x00\x02w\x00\x00'), which means that 1. the representation uses only basic ASCII characters, which doesn't cause any problems in the XML report or elsewhere, 2. the full contents of each string is captured, so it's always meaningful for debugging.
It also tries to display some kind of vertical diff with that - qw and + qw, but in this case it turns out to be useless, because all the non-basic-ASCII characters were filtered out from these. But that's still better than outputting them in the XML (which would make the XML invalid) and the real contents of both strings is already clear from the escaped form, so it doesn't matter.
I checked that the generated TEST-test_repro.TestRepro.xml only contains basic ASCII characters as follows:
(the (...) mark indicates the part I've omitted, otherwise the listing would be unnecessarily long)
09 is horizontal tab (often denoted \t), 0a in hex is line feed (often denoted \n) and everything between 20 and 7e (inclusive) are printable characters (see https://en.wikipedia.org/wiki/ASCII#Printable_characters), so there's nothing problematic.
Reproduction code:
test_reproducer.lua
The problem is that the JUnit XML reports will also (like the console output) contain these characters unescaped, resulting in invalid XML that the XML parsers I've tried refuse to read:
I tried:
Ruby
parse_xml.rb
Console output
```console $ ruby parse_xml.rb C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:96:in `rescue in parse': #Python:
pip install defusedxml
parse_xml.py
Console output
``` $ python parse_xml.py Traceback (most recent call last): File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1706, in feed self.parser.Parse(data, False) xml.parsers.expat.ExpatError: not well-formed (invalid token): line 9, column 67 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\temp\ks-experiments\luaunit-bug\parse_xml.py", line 2, inxmllint
(manpage, available on Ubuntu in the libxml2-utils package)Console output
``` $ xmllint report.xml report.xml:9: parser error : Char 0x0 out of allowed rangeYou might want to take a look at, for example, https://github.com/xmlrunner/unittest-xml-reporting from the Python ecosystem to see how it handles this situation:
Make sure to
pip install unittest-xml-reporting
first (I'm using the latest version3.2.0
):TEST-test_repro.TestRepro.xml
Note that the assertion failure is escaped (
'q\x00\x00\x02w\x00' != 'q\x00\x00\x02w\x00\x00'
), which means that 1. the representation uses only basic ASCII characters, which doesn't cause any problems in the XML report or elsewhere, 2. the full contents of each string is captured, so it's always meaningful for debugging.It also tries to display some kind of vertical diff with that
- qw
and+ qw
, but in this case it turns out to be useless, because all the non-basic-ASCII characters were filtered out from these. But that's still better than outputting them in the XML (which would make the XML invalid) and the real contents of both strings is already clear from the escaped form, so it doesn't matter.I checked that the generated
TEST-test_repro.TestRepro.xml
only contains basic ASCII characters as follows:(the
(...)
mark indicates the part I've omitted, otherwise the listing would be unnecessarily long)09
is horizontal tab (often denoted\t
),0a
in hex is line feed (often denoted\n
) and everything between20
and7e
(inclusive) are printable characters (see https://en.wikipedia.org/wiki/ASCII#Printable_characters), so there's nothing problematic.