LadybirdBrowser / ladybird

Truly independent web browser
https://ladybird.org
BSD 2-Clause "Simplified" License
21.93k stars 969 forks source link

XML parsing issue with Entity Reference #2133

Open paaspaas00 opened 1 week ago

paaspaas00 commented 1 week ago

Summary

There is an issue on function: https://github.com/LadybirdBrowser/ladybird/blob/c04297129323904f15c743eb83a3ff934769b14d/Userland/Libraries/LibXML/Parser/Parser.cpp#L754 I was able to figure out it is related to the parsing of Entity Reference. There may be an issue in there. Additionally, there is an Entity Reference grammar rule that seem not be implemented, namely [69] here https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Reference:

PEReference | ::= | '%' Name ';'

It's similar to the EntityRef one, so maybe easy to implement. Maybe that's the cause of the issue.

Side note, just wondering, is it worth to develop our own XML parsing lib? Maybe it's better to use an existing, well-mantained FOSS one?

Operating system

Linux

Steps to reproduce

  1. Go to https://yewtu.be/watch?v=5OYawYN9ZCs (or any Invidious video page really)
  2. There seem to be an issue when parsing XML related to DASH. See here:
636.040 WebContent(5257): Unhandled JavaScript exception: [TypeError] Cannot access property "setVolume" on undefined object
636.050 WebContent(5257):     at set
    at https://yewtu.be/videojs/video.js/video.js?v=d8b893e:26371:21
    at <unknown>
    at https://yewtu.be/videojs/video.js/video.js?v=d8b893e:4412:20
    at <unknown>
    at https://yewtu.be/videojs/video.js/video.js?v=d8b893e:4411:29
    at https://yewtu.be/videojs/video.js/video.js?v=d8b893e:5078:11
    at <unknown>

:1: parser error : EntityRef: expecting ';'
ndwidth="130429" codecs="mp4a.40.2"><BaseURL>/videoplayback?expire=1730611585&ei
                                                                               ^
:1: parser error : EntityRef: expecting ';'
4a.40.2"><BaseURL>/videoplayback?expire=1730611585&ei=IbUmZ-HvFaaG6dsPuabvmAs&ip
                                                                               ^
:1: parser error : EntityRef: expecting ';'
mZ-HvFaaG6dsPuabvmAs&ip=2603%3Ac020%3A8002%3Aaa00%3A2711%3A7e46%3Aba4f%3Ac4a7&id
                                                                               ^
:1: parser error : EntityRef: expecting ';'
2711%3A7e46%3Aba4f%3Ac4a7&id=o-ACsSloeXB0SKqSw1-sxjdPdph5NzLAvhjxSxHfRjsjc0&itag
                                                                               ^
:1: parser error : EntityRef: expecting ';'
%3Aba4f%3Ac4a7&id=o-ACsSloeXB0SKqSw1-sxjdPdph5NzLAvhjxSxHfRjsjc0&itag=140&source
                                                                               ^
:1: parser error : EntityRef: expecting ';'

...

[dash @ 0x563d6ef169c0] Unable to parse '' - missing root node

Expected behavior

The thing should parse correctly and without error

Actual behavior

Parsing error

URL for a reduced test case

Not vailable

HTML/SVG/etc. source for a reduced test case

Couldn't extract a minimal case but the Invidious site still provide plenty

Log output and (if possible) backtrace

See above

Screenshots or screen recordings

No response

Build flags or config settings

No response

Contribute a patch?

sideshowbarker commented 1 week ago

Side note, just wondering, is it worth to develop our own XML parsing lib? Maybe it's better to use an existing, well-mantained FOSS one?

Personally speaking: Yeah, I think it’d probably be better to switch to either Expat or libxml2. But also personally it’s not enough of a priority for me that I’d likely to make time any time soon to write up a patch for it.

@paaspaas00 Might it be enough of a priority for you that you’d be willing to write up a patch for it?

awesomekling commented 1 week ago

+1 to adopting a mature OSS library for XML parsing