aredridel / html5

Event-driven HTML5 Parser in Javascript
http://dinhe.net/~aredridel/projects/js/html5/
MIT License
590 stars 168 forks source link

Parser error #18

Closed demian85 closed 13 years ago

demian85 commented 13 years ago

Using the Zombie module, i get this trying to visit a facebook group page. This happens only with a few of them.

Zombie: GET http://www.facebook.com/group.php?gid=104369172929110&_fb_noscript=1 => 200

/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/tokenizer.js:62
                throw(e);
    ^
TypeError: Cannot call method 'toLowerCase' of undefined
    at Object.endTagFormatting (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/parser/in_body_phase.js:646:85)
    at Object.processEndTag (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/parser/phase.js:50:36)
    at EventEmitter.do_token (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/parser.js:97:20)
    at EventEmitter.<anonymous> (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/parser.js:112:30)
    at EventEmitter.emit (events:27:15)
    at EventEmitter.emitToken (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/tokenizer.js:84:7)
    at EventEmitter.emit_current_token (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/tokenizer.js:813:7)
    at EventEmitter.tag_name_state (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/tokenizer.js:358:8)
    at EventEmitter.<anonymous> (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/tokenizer.js:59:25)
    at EventEmitter.emit (events:27:15)
aredridel commented 13 years ago

Will fix. That's my fault!

aredridel commented 13 years ago

Any chance you can send me the page or code that does it? My download of that page doesn't give me a page that gives a parse error.

demian85 commented 13 years ago

i don't get it i cannot reproduce this error using a test script. The original script visits several facebook group pages and fetches the group member count. I made a test script that visits each of those groups by separate and none of them gives me an error! WTF! Also, there is no way to log wich is the url that throws an error since it's everything async and I cannot try-catch... I'm really lost here.

This is the test script: var sys = require('sys'), zombie = require('zombie');

['169894876151','40773425967','2234206716','138317449357',
    '8800820699','2217555364','22931164567'].forEach(function(group) {
    var browser = new zombie.Browser({
        debug: true,
        runScripts : false,
        userAgent : 'Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.237 Safari/534.10'
    });
    browser.visit('http://www.facebook.com/group.php?gid=' + group + '&_fb_noscript=1', function(err, browser, status) {
        if (!err && browser.document) {
            var target = browser.querySelector('#box_app_2356318349 h4.box_header span');
            if (target) {
                var members = target.innerHTML.match(/([\d.,]+)\s+(members|miembros)$/);
                if (members) console.log(parseFloat(members[1].replace(',', '.')));
            }                   

        }
    });
});

EDIT: Found the problem, I'm using a special mootools build for nodejs in the main script...

var Moo = require('./lib/mootools-core-1.3.js');

But it's strange, since everything is in the namespace "Moo" except for the native js object extensions. :s

demian85 commented 13 years ago

Please help with this bizarre error. Try extending Array.prorotype and using the html5 parser, it breaks with the same error above!

var sys = require('sys'), 
    zombie = require('zombie'); 

Array.prototype.wtf = function() {
    return [];
};

['169894876151','40773425967','2234206716'].forEach(function(group) { 
        var browser = new zombie.Browser({ 
                debug: true, 
                runScripts : false 
        }); 
        browser.visit('http://www.facebook.com/group.php?gid=' + 
encodeURIComponent(group) + '&_fb_noscript=1', function(err, browser, 
status) { 
// stuff... 
        }); 
}); 
demian85 commented 13 years ago

another issue, but now it has nothing to do with prototypes and shit...

var sys = require('sys'),
    zombie = require('zombie');

var browser = new zombie.Browser({
    debug: true,
    runScripts : false
});
browser.visit('http://www.orkut.com/Community?cmm=3502263', function(err, browser, status) {

});

/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/tokenizer.js:62
            throw(e);
    ^
Error: undefined: attribute name: ?
    at Object.createAttribute (/usr/local/lib/node/.npm/jsdom/0.1.23/package/lib/jsdom/level1/core.js:1239:13)
    at Object.setAttribute (/usr/local/lib/node/.npm/jsdom/0.1.23/package/lib/jsdom/level1/core.js:937:37)
    at TreeBuilder.copyAttributeToElement (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/treebuilder.js:20:11)
    at TreeBuilder.createElement (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/treebuilder.js:40:10)
    at TreeBuilder.insert_element_normal (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/treebuilder.js:61:21)
    at TreeBuilder.insert_element (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/treebuilder.js:52:15)
    at Object.addFormattingElement (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/parser/in_body_phase.js:718:12)
    at Object.startTagA (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/parser/in_body_phase.js:325:7)
    at Object.processStartTag (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/parser/phase.js:41:38)
    at Object.startTagOther (/usr/local/lib/node/.npm/html5/0.2.12/package/lib/html5/parser/in_cell_phase.js:59:37)
aredridel commented 13 years ago

Can I get you to start separate issues so I can track these more easily?

aredridel commented 13 years ago

Regarding TypeError: Object [ null ] has no method 'fixQueue', looks like that one throws an exception entirely within Zombie -- probably a bug there. I'm happy to help fix that, just don't have much experience there. Thoughts?

demian85 commented 13 years ago

thanks for the update. when can i update the module using npm?

aredridel commented 13 years ago

published v0.2.13 just now.

boblail commented 13 years ago

Thanks aredridel! I think this is resolved c.f. assaf/zombie#63.

demian85 commented 13 years ago

i ran "npm update html5" but nothing updates, i still have version 0.2.12, how's that?

aredridel commented 13 years ago

you might need to "npm install npm" -- see if there's an update there (the update function was broken in one release of npm, as far as I know)

aredridel commented 13 years ago

And close