godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.53k stars 21.08k forks source link

XMLParser incorrectly treats two '\n' as NODE_ELEMENT_END #81896

Open tokomine opened 1 year ago

tokomine commented 1 year ago

Godot version

v4.1.1.stable.official

System information

Godot v4.1.1.stable - macOS 13.4.1 - Vulkan (Mobile) - dedicated AMD Radeon Pro 5300M - Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz (12 Threads)

Issue description

When the xml ends with two newline character, XMLParser.read() will get a wrong type NODE_ELEMENT_END.

Steps to reproduce

  1. Run reproduction script.
  2. Observe output:
    
    4
    1
    1
    stable
    ----------------------------------------
    buffer content: "<p>abc</p>
    "
    node type: 1
           XMLParser.NODE_ELEMENT
           'p'

node type: 3 XMLParser.NODE_TEXT a b c

node type: 2 XMLParser.NODE_ELEMENT_END 'p'


buffer content: "

abc

" node type: 1 XMLParser.NODE_ELEMENT 'p'

node type: 3 XMLParser.NODE_TEXT a b c

node type: 2 XMLParser.NODE_ELEMENT_END 'p'

node type: 2 XMLParser.NODE_ELEMENT_END <------------ the correct result should be NODE_TEXT 'p'


buffer content: "

abc

" node type: 1 XMLParser.NODE_ELEMENT 'p'

node type: 3 XMLParser.NODE_TEXT a b c

node type: 2 XMLParser.NODE_ELEMENT_END 'p'

node type: 3 XMLParser.NODE_TEXT


### Minimal reproduction project

extends Node2D

var s1 = "

abc

\n" var s2 = "

abc

\n\n" var s3 = "

abc

\n\n\n"

func parse_xml(test_xml_buffer) -> void:

print("-".repeat(40))
print("buffer content: \"%s\"" % test_xml_buffer)

var xml_parser: XMLParser = XMLParser.new()

var err = xml_parser.open_buffer(test_xml_buffer.to_utf8_buffer())
if err != OK:
    push_error("error: %s" % err)
    return

while true:

    err = xml_parser.read()
    if err != OK:
        push_error("error: %s" % err)
        return

    var node_type = xml_parser.get_node_type()
    var node_type_str = ""
    var node_name = ""
    var node_data = ""

    prints("node type:", node_type)

    node_type_str = [
        "XMLParser.NODE_NONE",
        "XMLParser.NODE_ELEMENT",
        "XMLParser.NODE_ELEMENT_END",
        "XMLParser.NODE_TEXT",
        # ...
        ][node_type]

    prints("          ", node_type_str)

    match node_type:

        XMLParser.NODE_ELEMENT, XMLParser.NODE_ELEMENT_END:
            node_name = xml_parser.get_node_name()
            prints("          ", "'%s'" % node_name)

        XMLParser.NODE_TEXT:
            node_data = xml_parser.get_node_data()

            for i in range(node_data.length()):
                prints("          ", "%s" % (node_data.substr(i,1)))

        _:
            push_error("Unhandled node type.")

    print()

Called when the node enters the scene tree for the first time.

func _ready(): var version = Engine.get_version_info() print(version.major) print(version.minor) print(version.patch) print(version.status)

parse_xml(s1)
parse_xml(s2)
parse_xml(s3)
elenakrittik commented 1 year ago

This one is extremely hard to understand, but i finally found the exact circumstances under which this hell happens.

MRE project: xmlbug.zip Includes a simple python script to quickly generate xml input variants with different magic sequences.

MewPurPur commented 10 months ago

These explanations are overcomplicated, here's a simple one:

XMLParser reports the last XML node twice if the document ends with two whitespaces (" ", "\n", "\t", "\r"). Easy reproduction script:

# "Comment node" will be printed twice.
func _ready():
    var parser = XMLParser.new()
    parser.open_buffer("<!--A comment-->  ".to_ascii_buffer())
    while parser.read() != ERR_FILE_EOF:
        if parser.get_node_type() == XMLParser.NODE_COMMENT:
            print("Comment node")
elenakrittik commented 10 months ago

These explanations are overcomplicated, here's a simple one:

XMLParser reports the last XML node twice if the document ends with two spaces, tabs, or newlines. Easy reproduction script:

# "Comment node" will be printed twice.
func _ready():
  var parser = XMLParser.new()
  parser.open_buffer("<!--A comment-->  ".to_ascii_buffer())
  while parser.read() != ERR_FILE_EOF:
      if parser.get_node_type() == XMLParser.NODE_COMMENT:
          print("Comment node")

Thank you, this is indeed simpler than what i have written 😅