Closed Pomax closed 1 year ago
See #12. I'm willing to reevaluate the current (non) solution, but time is an issue. Do you want to take a stab at it?
Sure, if you want to explain where they're getting removed and what the parsing approach is, I probably can. I've written way too many parsers for way too many datatypes to not be able to at least take a stab at it.
I see you're already skipping over HTML comments, for example: without diving into the code, it feels like doing the exact same thing for <script>
and <style>
should be minimal work. Or even preprocess the source to extract regions we know are not going to get indented, replace them with "templating tags" so we have a hook to put that code back in, and then once indentation etc is done, prior to return, replace the "templating tags" with the original content again.
Looks like the very first thing renderTag
does is check if the node is unsupported and if so drops it: https://github.com/dave-kennedy/clean-html/blob/master/index.js#L227-L229
As I mentioned here, I wouldn't mind completely ignoring everything between script and style tags. I looked into refactoring it a long time ago and don't remember specifically what problems I ran into.
good to know - I'm reaching the end of my ~100,000loc/~1000file full project rewrite, so I'll probably poke around the index.js code to see if I can (cleanly) make it skip script/style this week. And, I suspect, add some code to make it include tags it doesn't know verbatim, because my html heavily relies on CustomElement, which should definitely not get stripped out ;)
Fixed in 6a1cc0d.
I have no idea why that made sense, but I don't see any way to change that, which is a shame because as much as I love
prettier
, it takes 3 seconds to run on content that this library needs <0.1s for, so I'd much rather use this.