Closed arasfeld-seamless closed 5 days ago
Confirmed - it seems that nested pre tags are replaced with p tags. Top level pre tags are eliminated correctly if disallowed.
Apparently this is not a bug, it's behavior coming from htmlparser2 which is the HTML parser upstream of us. And that module is behaving reasonably given the rules of HTML:
"Paragraphs are block-level elements, and notably will automatically close if another block-level element is parsed before the closing
tag. See "Tag omission" below."See:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p
So what happened is this:
htmlparser2 saw "pre" inside "p". htmlparser2 said "that's not a thing (see above), but I'll do the most reasonable thing I can: I'll close this p element, output the pre element, and then open a new p element."
sanitize-html stripped the "pre" element, per your configuration.
Probably we can't change this at the sanitize-html level, and shouldn't because htmlparser2 isn't doing anything wrong really.
On Fri, Jun 28, 2024 at 1:13 PM Robert Means @.***> wrote:
Confirmed - it seems that nested pre tags are replaced with p tags. Top level pre tags are eliminated correctly if disallowed.
— Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/issues/667#issuecomment-2197350014, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27PUACWY2R4YF3MIKH3ZJWKS3AVCNFSM6AAAAABKCHL5EOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJXGM2TAMBRGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
--
THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his
PLEASE NOTE: make sure the bug exists in the latest patch level of the project. For instance, if you are running a 2.x version of Apostrophe, you should use the latest in that major version to confirm the bug.
To Reproduce
sanitizedString =
<p>Lorem ipsum </p>dolorem sic amet<p></p>
;Expected behavior
Sanitized string should remove all non-
p
tagsShould return
<p>Lorem ipsum dolorem sic amet</p>
Describe the bug
A
</p>
gets inserted before "dolorem" and a<p>
gets inserted after "amet"Details
Version of Node.js: 18
Server Operating System: MacOSX
Additional context:
Screenshots If applicable, add screenshots to help explain your problem.