aredridel / html5

Event-driven HTML5 Parser in Javascript
http://dinhe.net/~aredridel/projects/js/html5/
MIT License
590 stars 168 forks source link

Parse issue when using Zombie #7

Closed its-danny closed 13 years ago

its-danny commented 13 years ago

Issue was mentioned here, but told to post here as well.

When trying to run any tests, it errors out with this:

/usr/local/lib/node/.npm/html5/0.2.5/package/lib/html5/tokenizer.js:62
            throw(e);
^
Error
at Object.appendChild (/usr/local/lib/node/.npm/jsdom/0.1.22/package/lib/jsdom/level1/core.js:1312:13)
at Object.insert_html_element (/usr/local/lib/node/.npm/html5/0.2.5/package/lib/html5/parser/before_html_phase.js:41:21)
at Object.processStartTag (/usr/local/lib/node/.npm/html5/0.2.5/package/lib/html5/parser/before_html_phase.js:29:7)
at EventEmitter.do_token (/usr/local/lib/node/.npm/html5/0.2.5/package/lib/html5/parser.js:94:20)
at EventEmitter.<anonymous> (/usr/local/lib/node/.npm/html5/0.2.5/package/lib/html5/parser.js:112:30)
at EventEmitter.emit (events:31:17)
at EventEmitter.emitToken (/usr/local/lib/node/.npm/html5/0.2.5/package/lib/html5/tokenizer.js:84:7)
at EventEmitter.emit_current_token (/usr/local/lib/node/.npm/html5/0.2.5/package/lib/html5/tokenizer.js:813:7)
at EventEmitter.tag_name_state (/usr/local/lib/node/.npm/html5/0.2.5/package/lib/html5/tokenizer.js:358:8)
at EventEmitter.<anonymous> (/usr/local/lib/node/.npm/html5/0.2.5/package
aredridel commented 13 years ago

Indeed, looks like a bug.

(Also, the key bit is the data being parsed: http://beaconpush.com/)

aredridel commented 13 years ago

Which version of node are you using?

its-danny commented 13 years ago

v0.2.6

aredridel commented 13 years ago

Hrm. Okay. Just not sure what the error is; it's rethrown by the parser, since it's a real exception and not just the input stream draining, but it's not printing out what the error is (this is unusual -- the error is rethrown exactly)

Is there code that triggers it that I can look at?

its-danny commented 13 years ago

It seems like trying to run any Zombie tests triggers it, but here's a quick example:

var zombie = require('zombie')
  , assert = require('assert')

 zombie.visit('http://localhost:9292', { debug : true }, function(error, bro
  if (error) throw error

  assert.ok(browser.querySelector('#login'))
})
aredridel commented 13 years ago

That's cut off on the right columns there.

its-danny commented 13 years ago

D'oh, it's function(error, browser) {

aredridel commented 13 years ago

Filling it in with "ser) {", it runs for me if I disable the call to fixQueue in zombie. It looks to me like it's just not patching jsdom right -- but I don't get your original error at all.

funkatron commented 13 years ago

I'm pretty much stuck on this atm; can't run any Zombie tests. If someone has specific workaround instructions (I'm a meganewb), that would be very appreciated. thanks!

clintecker commented 13 years ago

We're also seeing parsing errors on http://arstechnica.com --- I think maybe because of relative script src URIs (our ad calls)?

assaf commented 13 years ago

@aredridel you must have two instances of JSDOM there, one loaded by HTML5, one loaded by Zombie. That's why Zombie can't find the fixQueue method that it adds to HTMLDocument. But running without those patches is running an incomplete test (scripts will load in wrong order, some events not handled, etc)

@clintecker if the problem is in resolving the src URL, it lies either in Zombie or JSDOM.

rubys commented 13 years ago

I'm seeing this problem too... without using Zombie. On a fresh install of node, npm, and html5 on Ubuntu 10.04-1.

/home/rubys/lib/node/.npm/html5/0.2.7/package/lib/html5/tokenizer.js:62
                throw(e);
    ^
Error
    at Object.appendChild (/home/rubys/lib/node/.npm/jsdom/0.1.23/package/lib/jsdom/level1/core.js:1312:13)
    at Object.insert_html_element (/home/rubys/lib/node/.npm/html5/0.2.7/package/lib/html5/parser/before_html_phase.js:41:21)
    at Object.processStartTag (/home/rubys/lib/node/.npm/html5/0.2.7/package/lib/html5/parser/before_html_phase.js:29:7)
    at EventEmitter.do_token (/home/rubys/lib/node/.npm/html5/0.2.7/package/lib/html5/parser.js:94:20)
    at EventEmitter. (/home/rubys/lib/node/.npm/html5/0.2.7/package/lib/html5/parser.js:112:30)
    at EventEmitter.emit (events:31:17)
    at EventEmitter.emitToken (/home/rubys/lib/node/.npm/html5/0.2.7/package/lib/html5/tokenizer.js:84:7)
    at EventEmitter.emit_current_token (/home/rubys/lib/node/.npm/html5/0.2.7/package/lib/html5/tokenizer.js:813:7)
    at EventEmitter.after_attribute_value_state (/home/rubys/lib/node/.npm/html5/0.2.7/package/lib/html5/tokenizer.js:557:8)
    at EventEmitter. (/home/rubys/lib/node/.npm/html5/0.2.7/package/lib/html5/tokenizer.js:59:25)

Here's my script:

var http = require('http'),
    html5 = require('html5'),
    jsdom = require('jsdom').jsdom(),
   window = jsdom.createWindow(null, null, {parser: html5});

var rubix = http.createClient(80, 'intertwingly.net');
var request = rubix.request('GET', '/blog/',
  {'host': 'intertwingly.net'});
request.end();
request.on('response', function (response) {
  var length = 0;
  var data = new Buffer(parseInt(response.headers['content-length']));
  response.on('data', function (chunk) {
    chunk.copy(data, length, 0, chunk.length);
    length = length + chunk.length;
  });
  response.on('end', function () {
    // console.log(data.toString('utf-8'));
    var parser = new html5.Parser({document: window.document});
    parser.parse(data);
    console.log(window.document.innerHTML);
  });
});
aredridel commented 13 years ago

Looks like it's jsdom's non-textual error messages bite again.

It's trying to double-add a document element. I'll see about fixing that.

aredridel commented 13 years ago

Got it! The document passed in wasn't empty (thanks, jsdom!), I detect that now and handle it (using the existing document tag, clearing its children.)

Fixed in v0.2.9

assaf commented 13 years ago

With 0.2.9 I get "Cannot call method 'toLowerCase' of undefined" from lib/html5/treebuilder.js:176:36

Also before_html_phase calls console.debug, which is not defined in node 0.2.6. Assume this should be a call to HTML5.debug.

aredridel commented 13 years ago

I've fixed the debug statement.

Looking at the rest.