Closed ghost closed 8 years ago
hmm-- that's really the HTML parser's issue.
here's a simple test case.
var jsdom = require('jsdom-nogyp');
var doc = jsdom.jsdom(null, null, {
features: {
FetchExternalResources: false
},
url: "file://" + (process.cwd())
});
var win = doc.parentWindow;
var container = win.document.createElement('div');
var html = 'A mathematical range breaks document <3, 5] This text will be lost';
container.innerHTML = html;
console.log(container.innerHTML);
// will output
// A mathematical range breaks document
maybe we should ask here: https://github.com/dexteryy/jsdom-nogyp
jsdom-nogyp is using https://github.com/fb55/htmlparser2
goto http://demos.forbeslindesay.co.uk/htmlparser2/
paste: A mathematical range breaks document <3, 5] This text will be lost
[
{
data: 'A mathematical range breaks document '
type: 'text'
next: null
prev: null
parent: null
}
]
that's probably where we should ask
so yea, a real browser seems to be able to figure that out correctly https://jsfiddle.net/fyx6fy0c/
It looks like htmlparser2 accepts each type of tags, so we can just replace all occurrences of '<' to <
before parsing input as simply workaround.
no - that will break any real html tag <p> <3, 5] </p>
You need a way to differentiate and parse html tags correctly, which is what the htmlparser2 is supposed to do - regex won't work, not with infinite nesting.
instead of <
you can still also put a space between the <
and the 3
- i.e. < 3, 5]
but parser problem remains - you can't do that programmatically without an htmlparser.-
UNLESS - you know for a fact that there is NO HTML in your text.
BBCode post doesn't contain html typically, but if it has any, it should be escaped anyway imho. However, I agree with you, htmlparser2 should do its job - parsing html, not xml-like tags.
fixed when i switched to using to-markdown
Minimal example: