tokomine commented 1 year ago

Godot version

v4.1.1.stable.official

System information

Godot v4.1.1.stable - macOS 13.4.1 - Vulkan (Mobile) - dedicated AMD Radeon Pro 5300M - Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz (12 Threads)

Issue description

When the xml ends with two newline character, XMLParser.read() will get a wrong type NODE_ELEMENT_END.

Steps to reproduce

Run reproduction script.

Observe output:


4
1
1
stable
----------------------------------------
buffer content: "<p>abc</p>
"
node type: 1
       XMLParser.NODE_ELEMENT
       'p'

node type: 3 XMLParser.NODE_TEXT a b c

node type: 2 XMLParser.NODE_ELEMENT_END 'p'

buffer content: "

abc

" node type: 1 XMLParser.NODE_ELEMENT 'p'

node type: 3 XMLParser.NODE_TEXT a b c

node type: 2 XMLParser.NODE_ELEMENT_END 'p'

node type: 2 XMLParser.NODE_ELEMENT_END <------------ the correct result should be NODE_TEXT 'p'

buffer content: "

abc

" node type: 1 XMLParser.NODE_ELEMENT 'p'

node type: 3 XMLParser.NODE_TEXT a b c

node type: 2 XMLParser.NODE_ELEMENT_END 'p'

node type: 3 XMLParser.NODE_TEXT


### Minimal reproduction project

extends Node2D

var s1 = "

abc

\n" var s2 = "

abc

\n\n" var s3 = "

abc

\n\n\n"

func parse_xml(test_xml_buffer) -> void:

print("-".repeat(40))
print("buffer content: \"%s\"" % test_xml_buffer)

var xml_parser: XMLParser = XMLParser.new()

var err = xml_parser.open_buffer(test_xml_buffer.to_utf8_buffer())
if err != OK:
    push_error("error: %s" % err)
    return

while true:

    err = xml_parser.read()
    if err != OK:
        push_error("error: %s" % err)
        return

    var node_type = xml_parser.get_node_type()
    var node_type_str = ""
    var node_name = ""
    var node_data = ""

    prints("node type:", node_type)

    node_type_str = [
        "XMLParser.NODE_NONE",
        "XMLParser.NODE_ELEMENT",
        "XMLParser.NODE_ELEMENT_END",
        "XMLParser.NODE_TEXT",
        # ...
        ][node_type]

    prints("          ", node_type_str)

    match node_type:

        XMLParser.NODE_ELEMENT, XMLParser.NODE_ELEMENT_END:
            node_name = xml_parser.get_node_name()
            prints("          ", "'%s'" % node_name)

        XMLParser.NODE_TEXT:
            node_data = xml_parser.get_node_data()

            for i in range(node_data.length()):
                prints("          ", "%s" % (node_data.substr(i,1)))

        _:
            push_error("Unhandled node type.")

    print()

Called when the node enters the scene tree for the first time.

func _ready(): var version = Engine.get_version_info() print(version.major) print(version.minor) print(version.patch) print(version.status)

parse_xml(s1)
parse_xml(s2)
parse_xml(s3)

elenakrittik commented 1 year ago

This one is extremely hard to understand, but i finally found the exact circumstances under which this hell happens.

If right after (right after the > in </close>) last closing tag in the file there is exactly two blank-ish escape sequences (as far as i tested, any combination of \n, \r and/or \t, i.e. \t\t, \n\r, \r\t, etc.; maybe other escapes too, but e.g. \f and \v do not work; called "magic sequence" below), that closing tag will be emitted again (duplicated), but exactly once.
If instead of double newline there is \n, \r, \n\n\n, \r\n\r or whatever else except the above-mentioned combinations, nothing unusual will happen.
If you put anything between > and the magic sequence, "magic" will not happen.

MRE project: xmlbug.zip Includes a simple python script to quickly generate xml input variants with different magic sequences.

MewPurPur commented 10 months ago

These explanations are overcomplicated, here's a simple one:

XMLParser reports the last XML node twice if the document ends with two whitespaces (" ", "\n", "\t", "\r"). Easy reproduction script:

# "Comment node" will be printed twice.
func _ready():
    var parser = XMLParser.new()
    parser.open_buffer("<!--A comment-->  ".to_ascii_buffer())
    while parser.read() != ERR_FILE_EOF:
        if parser.get_node_type() == XMLParser.NODE_COMMENT:
            print("Comment node")

elenakrittik commented 10 months ago

These explanations are overcomplicated, here's a simple one:

XMLParser reports the last XML node twice if the document ends with two spaces, tabs, or newlines. Easy reproduction script:
# "Comment node" will be printed twice.
func _ready():
  var parser = XMLParser.new()
  parser.open_buffer("  ".to_ascii_buffer())
  while parser.read() != ERR_FILE_EOF:
      if parser.get_node_type() == XMLParser.NODE_COMMENT:
          print("Comment node")

Thank you, this is indeed simpler than what i have written 😅

godotengine / godot

XMLParser incorrectly treats two '\n' as NODE_ELEMENT_END #81896

Godot version

System information

Issue description

Steps to reproduce

Called when the node enters the scene tree for the first time.