ariya / phantomjs

Scriptable Headless Browser
http://phantomjs.org
BSD 3-Clause "New" or "Revised" License
29.46k stars 5.76k forks source link

bug with response from getElementsByTagName when using page.evaluate #11394

Closed ijx33 closed 11 years ago

ijx33 commented 11 years ago

When using evaluate to get response from the method document.getElementsByTagName it gives only the first element found, the response length is correctly given, but the later elements are all null. Same behaviour fro document.getElementsByClassName

Simple example:


var page = require("webpage").create();
page.open("http://phantomjs.org/", ps);

function ps(s) {
    'use strict';
    var i,
        h3list;

    if (s === "success") {
        h3list = page.evaluate(function () {
            return document.getElementsByTagName("h3");
        });

        console.log("number of h3 elements: " + h3list.length);

        for (i in h3list) {
            console.log(":", i); //demonstrate that the array has been made
        }

        for (i = 0; i < h3list.length; i = i + 1) {
            console.log(i, h3list[i]);
            if (h3list[i] !== null) {
                console.log("  text value:", h3list[i].innerHTML);
            }
        }
    } else {
        console.log("loading error");
    }
    phantom.exit();
}

(using 32bit windows)

ariya commented 11 years ago

Please read the documentation. You can only return primitive serializable object from evaluate (no closure, DOM object, etc).

wilriker commented 11 years ago

If you only need a specific subset of the elements' values you can workaround this restriction by using the following function for evaluation:

function() {
    var imgTags = document.getElementsByTagName('img');

    var ret = {};
    for (var i = 0; i < imgTags.length; i++) {
        try {
            var imgProps = {};
            imgProps['width'] = imgTags[i].width;
            imgProps['height'] = imgTags[i].height;
            imgProps['naturalWidth'] = imgTags[i].naturalWidth;
            imgProps['naturalHeight'] = imgTags[i].naturalHeight;
            imgProps['src'] = imgTags[i].src;
            ret['tag' + i] = imgProps;
        } catch (err) { }
    }

    return ret;
};
CunuuKum commented 11 years ago

Hi Manuel! It is better to use getAttribute('name_of_attribute') inside your loop. I found that it does not work ok for attribute 'sizes', for tag 'link' at least. Also I checked it with some additional custom attribute on tag 'LINK'. Also did not work. It only could get the first attribute (rel) and that's all. Example of code with call to getAttribute() method:

function () {
                var linkTags = document.head.getElementsByTagName('link');
                var ret = {};
                for (var i = 0; i < linkTags.length; i++) {
                    try {
                        var linkProps = {};
                        linkProps['rel'] = linkTags[i].getAttribute('rel');
                        linkProps['sizes'] = linkTags[i].getAttribute('sizes');
                        ret['tag' + i] = linkProps;
                    } catch (err) { }
                }
                return ret;

            }

with getAttribute() it works ok.

Aetherpoint commented 9 years ago

@CunuuKum / @wilriker So is there anyway to just grab all/any H1 tags even if there's no assured ID, class or attribute?

allan-bonadio commented 9 years ago

document.getElementsByTagName('h1') see this: https://developer.mozilla.org/en-US/docs/Web/API/Document/getElementsByTagName also works on other elements for those enclosed inside.

That's built into the dom. Most people use jQuery as it's pretty powerful to get stuff.

henryruhs commented 7 years ago

Similar issue on Stackoverflow with a solution: https://stackoverflow.com/questions/44952239/process-dom-elements-with-phantomjs