Closed goose3228 closed 2 years ago
I've had a look at the source, which is quite simple. Is there any reason not to use a library?
If nobody has picked this up yet, I'd like to contribute! Though would first clear up the discussion about whether or not to use a library for this, as @goose3228 mentioned.
Working on this. There is a package https://github.com/grokify/html-strip-tags-go that has an extracted version of the unexported stripTags function in html/template/html.go. I've tested it and it works correctly for the html example provided in this issue. But I guess @bep does not agree with adding more dependencies, specially for an extracted code. I'm trying to add code to support the provided example html code in StripHTML(), But if you think using the extracted code is fine, please let me know. (There had been discussion for exporting the stripTags function here: https://github.com/golang/go/issues/5884 with no success)
There had been discussion for exporting the stripTags function ... with no success
There have been several discussions, including:
Frustrating...
bluemonday is another possibility, previously considered for other tasks:
t := `<div data-action='click->my-controller#doThing'>qwe</div>`
fmt.Println(bluemonday.StrictPolicy().Sanitize(t)) --> qwe
But I guess @bep does not agree with adding more dependencies, specially for an extracted code.
Indeed you don't need a library to save 50 lines of code. The point is that current implementation is too simple to be 100% correct.
A few examples:
{{ "if(x < 1) console.log(x)" | plainify }}
would result in if(x
).{{ "if(x < 1) console.log(x)" | plainify }}
turns &
into &
On the other hand, if you fix my use case it will lead to more "bug triggers". Currently <>
symbols are the only trigger i can imagine, but if you fix it, quotes can become a new one. I suggest the library, because html escaping is not that straightforward.
Go's tpl has 3000 lines, and there is reason for such complexity.
I've tried to use bluemonday, but some tests from different packages failed, because of inconsistencies in how space and escaped characters and and some tags are replaced to produce output.
would it be safe if I change tests, so they meet the output of bluemonday? (for example adjusting expected outputs by removing or adding \n )
I've tried to use bluemonday
I would wait until the project lead approves the inclusion of an additional dependency.
would it be safe if I change tests
That depends on whether the differences are material. Blindly changing the expected results to match the new output is a bit like scoring your own exam.
@jmooring for a matter of discussion, I thought actually trying to use this library and finding any issues in making it work in place of the current stripHtml() function, can help @bep decide easier. And as you've already mentioned above, bluemonday was already being considered as a candidate. I'm trying to move this forward. Though, if bep doesn't agree, it's all good, and respected.
Added a PR including bluemonday library as a proposed solution. I also had to include the "space" characters manipulation logics, and also stripHTMLReplacer. I preferred not to, and have StripHTML() just strip HTML without other manipulations, but that'd break a lot of tests. Maybe this could be refactored another time. Though, I had to tweak one of the old tests a little bit so that it doesn't fail. Added the HTML code from the original post of this issue, and some others as tests.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Also affects auto-generated Page.Summary, Page.Plain and so on.
What version of Hugo are you using (
hugo version
)?Does this issue reproduce with the latest release?
Yes, with git/master too
Test command (in template):
Expected output:
Current output:
P.S. arrow
->
is commonly used in html attributes by stimulus.js framework. The problem is sensible when page starts with interactive shortcode, and summary gets garbage from data attributes.