Closed danielthepope closed 9 years ago
/<\/?[^a\/][^>]*>/ig
is close, but retains tags beginning with 'a', i.e. <abbr>
If for whatever reason, the NRCC message returns a <abbr>
tag, I don't want it included in trntxt. I have a failing test that I'd like to pass.
To do: modify the regular expression in removeHtmlTagsExceptA
, src/nationalrail.js:172.
In short, <a href="#">pass</a>
should return <a href="#">pass</a>
(i.e. remain the same), <p>pass</p>
should return pass
, and <abbr>text</abbr>
should return text
.
Test with npm test
. You don't need a National Rail API key for this one.
Would sanitize-html be an option? It seems a bit of an overhead, though I guess it's safer and might be easier to extend down the line.
(obligatory stack overflow link)
I guess you would lose line breaks at the end of <p>
, if they're important you could map the tags to spans and achieve that with some styling:
clean = sanitizeHtml(dirty, {
transformTags: {
'p': sanitizeHtml.simpleTransform('span', {class: 'fake-p'})
}
});
…would lose some of the semantic niceness, but might let you present it better to a user
Hey @benfoxall, thanks for taking a look :smile: And thanks for pointing me in the direction of that library. I'm not going to use it now, but that might well come in handy some point soon.
I've given the problem some thought this evening and I've written a regex that passes my test! Learned quite a bit about regexes in doing so! :smiley: It might well be able to go shorter, but meh.
/<\/?((([^\/a>]|a[^> ])[^>]*)|)>/ig
Generated HTML is currently
Full HTML response on the wiki, for testing
<P>
inside<p>
:point_right: :fire:Regex to find
<p>
tags:/<\/{0,}P>/ig
. However we should really be removing all tags except<a>
because they're actually useful. I need to up my regex game.