Open eGavr opened 10 years ago
They are -- the HTML5 parser only concerns itself with parsing to construct a DOM.
Are you going to fix this situation?
Does it need to be fixed? What's the use-case?
Yes! For example, when I want to transform my DOM tree back to html!
In the cases I've shown above, SAXparser
parses them in the same way!
It is a little bit unfair
on the hand of your SAXparser
.
SAXParser notifies about element start and element end, not about start tag and end tag. That's all.
Yes, they are identical for a browser, but in the point of view of parsing they are not identical
If you really need low-level parsing info, you can use Tokenizer.
For example, when I want to transform my DOM tree back to html!
There is a limited set of VOID elements, so it is easy to serialize. http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#serialising-html-fragments
Example of producing HTML from SAX events: https://gist.github.com/danyaPostfactum/ee94c3bf88b99fb94c4b Example:
var SAXParser = require('html5').SAXParser;
var HtmlSerializer = require('./HtmlSerializer').HtmlSerializer;
var outStream = require('fs').createWriteStream("out.html");
var parser = new SAXParser();
var serializer = new HtmlSerializer(outStream);
parser.contentHandler = parser.lexicalHandler = serializer;
parser.parse('...');
But how can I understand whether the tag is self closing?
Just check it's name matches one of area, base, basefont, bgsound, br, col, embed, frame, hr, img, input, keygen, link, menuitem, meta, param, source, track or wbr element.
But if someone is so bad person and want to parse an invalid input?
<bra/>?
, for example?
Thank you for the list of self closing text!
<bra/>
, for example?
According to spec, it will be interpreted as <bra>
. You can check this in your browser.
Thank you for the list of self closing text!
See http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#serialising-html-fragments
But I can try to parse this situation :
<br></br>
It will be for browser - <br>
,but what will I receive after serialization?
In two cases I will receive the same, but two inputs were not the same.
Maybe it is necessary to add a parameter into one of your contentHandler's method, it will be true
if the tag is self closing?
Parser will ignore </br>
tag.
In two cases I will receive the same, but two inputs were not the same.
Yes, invalid markup will be repaired. I already said about it. Even valid input markup may not match serialized output. Could you explain how do you want to use parser? Probably you need another tool.
For example, I want to check the validity of input or as in my case I want to compare to HTML
!
For me it is necessary to check the HTML
s as they are!
For example, I want to check the validity of input
This parser is used in http://ace.c9.io/build/kitchen-sink.html (select HTML mode) for syntax checking.
parser.errorHandler = {
error: function(message, location, code) {
// Parse error
}
};
For me it is necessary to check the HTMLs as they are!
Not sure what do you mean. I guess you have to write your own parser.
Besides, what about this situation:
and
?
It seems, the
contentHandler
parses them just in the same way! Yes, they are identical for a browser, but in the point of view ofparsing
they are not identical, are they?