Closed rkumar-c closed 7 years ago
Please paste you code here.Perhaps somebody will help you.
@rkumar-c, what do you mean by overflow elements? For example, the menu that slides horizontally and only a few elements shown at the beginning? In this case, it will not show you all elements of the menu, because it is hidden (you will need to slide to see other elements) and you will receive the same result if you would open this webpage with a normal mobile device.
I need to capture the screenshot of desktop webpage not from mobile devices. I understand that there is scroll but at least images should be captured whatever is visible. If you will look at image, in second and third frame no images are captured. I achieved this by writing custom js inside page.evaluate and here is the code:
page.evaluate(function() {
var box=document.getElementsByClassName("items-box-photo"); for(i=0; i<box.length; i++) { // var myimg = box[i].getElementsByTagName('img')[0]; var someimage = box[i].children[0].getAttribute('data-src'); console.log("image src " + someimage); box[i].innerHTML= " "; } }); But this is page specific and we can not run it capture any webpages. Now I am facing issue to capture screenshot of webpages with ajax and lazy loading like amazon.in or http://www.shopclues.com/fashion.html
So is there way to write common code and to capture screenshot of webpage like amazon or any other with lazy load webpages?
I'm a CasperJS/PhantomJS user for couple of years. Hmm normally images don't show up because they haven't completed loading. Not sure if explicitly waiting before capturing the images will help.
@rkumar-c, make sure check that you don't have
page.settings.loadImages = false
Once I have activated this setting in the script and could not understand why images were not loading for me.
Ref: http://phantomjs.org/api/webpage/property/settings.html
Suddenly I think that I have encountered the similar problem. I only enter a few words in the html file, and did not set any length and width (https://item.mercari.com/gl/m56999829099/ here is also not set, and only has 'min-height'),and then the results render out of the picture size Is 400 * 300. It seems to be the minimum length and breadth of phantomjs. @bologer Is phantomjs really made the default settings? If so, @ rkumar-c,you can first change the width of the page you need and then open the website.
I am not getting the full page screenshot for following two URLs: https://shop.adidas.co.in/#c/men-basketball-shoes/Pag-60/No-0/0 http://www.lifestylestores.com/c/men-tops-tshirts
Can anybody provide sample code to capture screenshot for above URLs.
I'm not expert at PhantomJS, but for a general purpose web automation tool that I make (base on CasperJS/PhantomJS), the code is below. I get Adidas logo only for second row of images onwards. For the second website, it is blank image for below and also when I use Safari browser. Not sure what is wrong with the website that I cannot even see with real browser.
https://shop.adidas.co.in/#c/men-basketball-shoes/Pag-60/No-0/0
wait 10 seconds
snap page to adidas.png
http://www.lifestylestores.com/c/men-tops-tshirts
wait 10 seconds
snap page to lifestore.png
@rkumar-c your problem with the first website is the loading time. For some reason it takes around 20 seconds to load all images and everything on the website.
This is what I received on 15 secs, so I assume that you will have all images loaded in about 20 secs.
var p = p || {};
p.adidas = {
webpage: false,
system: false,
page: false,
url: false,
userAgent: false,
newsJSON: false,
newsString: false,
init: function() {
this.webpage = require('webpage');
this.fs = require('fs');
this.page = this.webpage.create();
this.url = 'https://shop.adidas.co.in/#c/men-basketball-shoes/Pag-60/No-0/0';
this.userAgent = 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0';
this.timeout = 20000;
this.log = '[ =========> ] ';
},
solve: function() {
var self = this;
this.page.settings.userAgent = this.userAgent;
this.page.viewportSize = { width: 1024, height: 900 };
this.page.open(this.url, function(status) {
console.log(self.log + 'Page loaded');
try {
setTimeout(function() {
console.log(self.log + 'Timeout finished.');
self.page.render('1.png');
console.log(self.log + 'Picture taken');
console.log(self.log + 'Job should be finished now');
phantom.exit()
}, self.timeout);
} catch(e) {
console.log('PhantomJS has unexpectable stopped working. Date: ' + new Date().toUTCString());
phantom.exit();
}
});
}
}
p.adidas.init();
p.adidas.solve();
@bologer I tried the above code but fails to capture all the images, this is also happening with ebay and alibaba websites. What could be the reason for not capturing all the images, I increased the timeout 90000 but that also didn't work. URLs to capture: https://www.ebay.com/b/Toy-Kites/2569/bn_1924212 https://www.alibaba.com/Doors-Windows_pid100006533?spm=a2700.8293689.0.0.QF5otp
@rkumar-c, alright, I will test my code a bit more and will try to output working solution :+1:
Also adding on, invisible browsers such as PhantomJS or Electron (through NightmareJS) are different in behaviour from real browsers such as Chrome or Safari. And website owners often add logic to prevent invisible / automated browsers from working correctly.
This is a nice post by a friend on this topic. Although I normally will not automate for websites which don't want to serve automated browsers. https://franciskim.co/dont-need-no-stinking-api-web-scraping-2016-beyond
Other possible setups could be using Selenium + Chrome + Xvfb or maybe SikuliX + Xvfb to replicate exact browsers behavior. Also, headless Chrome is here (Firefox headless soon too). Tools such as CasperJS which have intention to support headless Chrome or Chromy can also be considered.
@rkumar-c, if purpose of this threat to get images of the product from let's say Adidas, then the following code can be used. It is not required to load the images unless the whole DOM has loaded, than you can scrap the image URLs and recursively saved them.
Btw, I understood why you are not seeing all of the images, because they are shown on the scroll event. Try to emulate scroll event once you have loaded the page and you will see all of the images :+1:
If you would like to get just images of the product, than the code below would help you:
var p = p || {};
p.adidas = {
webpage: false,
system: false,
page: false,
url: false,
userAgent: false,
newsJSON: false,
newsString: false,
init: function() {
this.webpage = require('webpage');
this.fs = require('fs');
this.page = this.webpage.create();
this.url = 'https://shop.adidas.co.in/#c/men-basketball-shoes/Pag-60/No-0/0';
this.userAgent = 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0';
this.timeout = 6000;
this.log = '[ =========> ] ';
},
solve: function() {
var self = this;
this.page.settings.userAgent = this.userAgent;
this.page.viewportSize = { width: 1024, height: 900 };
this.page.open(this.url, function(status) {
console.log(self.log + 'Page loaded');
try {
setTimeout(function() {
console.log(self.log + 'Timeout finished.');
var imageUrls = self.page.evaluate(function() {
var q = document.querySelectorAll('.productListing li.card');
var obj = {};
var length = q.length - 1;
for(var i = 0; i <= length; i++) {
obj[i] = {
name: q[i].querySelector('.adidasOriginals.productIdentifier').innerText.trim(),
src: q[i].querySelector('.productImageWrap > img').getAttribute('data-src').trim().replace(/\.plp$/gi, '')
};
}
return obj;
});
self.page.render('1.png');
console.log(self.log + 'Picture taken');
console.log(self.log + 'Job should be finished now');
console.log(self.log + 'Images:');
console.log(JSON.stringify(imageUrls));
self.fs.write('urls.txt', JSON.stringify(imageUrls));
phantom.exit()
}, self.timeout);
} catch(e) {
console.log('PhantomJS has unexpectable stopped working. Date: ' + new Date().toUTCString());
phantom.exit();
}
});
}
}
p.adidas.init();
p.adidas.solve();
Though I am not sure if this is what you are looking for.
Cool stuff! Thanks @bologer for sharing! :smile:
Thanks guys for promptly replying to the issue with solutions, I really appreciate solution given by @bologer and it was very much helpful to me. Here I am closing this thread and once again thanks to @bologer.
Hi, I am using PhantomJS 2.1.1 to capture the screenshot of a webpage. The page has three divs with style overflow hidden and auto property. Unfortunately PhantomJS is capturing only top div image and ignoring other div images. Following is the URL for which I am getting issue: URL : https://item.mercari.com/gl/m56999829099/
Captured images is below:
Please help me out to fix it as soon as possible. Rakesh.