aredridel / html5

Event-driven HTML5 Parser in Javascript
http://dinhe.net/~aredridel/projects/js/html5/
MIT License
590 stars 168 forks source link

Replaced `IllegalArgumentException` with `Error` #108

Closed fb55 closed 10 years ago

fb55 commented 10 years ago

There is no IllegalArgumentException in JS, and I doubt a ReferenceError is expected here.

I encountered this while trying to get andreasmadsen/htmlparser-benchmark running with the latest html5 version. Do you have any suggestions for running this module with a minimal configuration?

aredridel commented 10 years ago

What kind of minimal configuration did you have in mind?

fb55 commented 10 years ago

Something that demonstrates html5's parsing speed, without the overhead of eg. JSDOM. Probalby just a minimal SAXParser handler.

aredridel commented 10 years ago

Yeah. I don't have any offhand -- I've been meaning to make one for a while. @danyaPostfactum is the one who added the SAX-style support.

danyaPostfactum commented 10 years ago

This is a "wrapper" for the benchmark.

var Parser = require('html5/lib/sax/SAXParser').SAXParser;

module.exports = function (html, callback) {
    var parser = new Parser();
    var noop = function() {};
    parser.contentHandler = {
        startDocument: noop,
        endDocument: noop,
        startElement: noop,
        endElement: noop,
        characters: noop
    };
    parser.parse(html);
    callback();
};
fb55 commented 10 years ago

@danyaPostfactum Thanks!

Apparently, you have a bug in this library:

htmlparser-benchmark/node_modules/html5/lib/sax/SAXTreeBuilder.js:297
    var child = parent.firstChild;
                      ^
TypeError: Cannot read property 'firstChild' of null
    at Node.ParentNode.appendChildren (htmlparser-benchmark/node_modules/html5/lib/sax/SAXTreeBuilder.js:297:20)
    at SAXTreeBuilder.reparentChildren (htmlparser-benchmark/node_modules/html5/lib/sax/SAXTreeBuilder.js:79:12)
    at SAXTreeBuilder.TreeBuilder.adoptionAgencyEndTag (htmlparser-benchmark/node_modules/html5/lib/TreeBuilder.js:2574:8)
    at Object.TreeBuilder.modes.inBody.endTagFormatting (htmlparser-benchmark/node_modules/html5/lib/TreeBuilder.js:1392:13)
    at Object.TreeBuilder.modes.base.processEndTag (htmlparser-benchmark/node_modules/html5/lib/TreeBuilder.js:155:38)
    at SAXTreeBuilder.TreeBuilder.processToken (htmlparser-benchmark/node_modules/html5/lib/TreeBuilder.js:2667:17)
    at Tokenizer._emitToken

The file in question is this one.