dave-kennedy / clean-html

HTML cleaner and beautifier for Node
The Unlicense
47 stars 10 forks source link

<script> get stripped from <head> #19

Closed Pomax closed 1 year ago

Pomax commented 4 years ago

I have no idea why that made sense, but I don't see any way to change that, which is a shame because as much as I love prettier, it takes 3 seconds to run on content that this library needs <0.1s for, so I'd much rather use this.

dave-kennedy commented 4 years ago

See #12. I'm willing to reevaluate the current (non) solution, but time is an issue. Do you want to take a stab at it?

Pomax commented 4 years ago

Sure, if you want to explain where they're getting removed and what the parsing approach is, I probably can. I've written way too many parsers for way too many datatypes to not be able to at least take a stab at it.

I see you're already skipping over HTML comments, for example: without diving into the code, it feels like doing the exact same thing for <script> and <style> should be minimal work. Or even preprocess the source to extract regions we know are not going to get indented, replace them with "templating tags" so we have a hook to put that code back in, and then once indentation etc is done, prior to return, replace the "templating tags" with the original content again.

dave-kennedy commented 4 years ago

Looks like the very first thing renderTag does is check if the node is unsupported and if so drops it: https://github.com/dave-kennedy/clean-html/blob/master/index.js#L227-L229

As I mentioned here, I wouldn't mind completely ignoring everything between script and style tags. I looked into refactoring it a long time ago and don't remember specifically what problems I ran into.

Pomax commented 4 years ago

good to know - I'm reaching the end of my ~100,000loc/~1000file full project rewrite, so I'll probably poke around the index.js code to see if I can (cleanly) make it skip script/style this week. And, I suspect, add some code to make it include tags it doesn't know verbatim, because my html heavily relies on CustomElement, which should definitely not get stripped out ;)

dave-kennedy commented 1 year ago

Fixed in 6a1cc0d.