gabceb / node-metainspector

Node npm for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, an array with all the links, all the images in it, etc. Inspired by the metainspector Ruby gem
MIT License
129 stars 52 forks source link

Mobile page urls are not getting scrapped #46

Open suyogdilipkale opened 6 years ago

suyogdilipkale commented 6 years ago

I am using this package for my mobile app client.

when user browse the urls on mobile devices, we pass the url to back-end sever and use this package to scrape the metadata, but if those mobile responsive urls are not supported on desktop then it doesn't extract the metadata as the package is executed on server.

sample url: https://www.m.webmd.com/vitamins/ai/ingredientmono-483/peanut-oil this is mobile responsive url and if you browse on desktop machine it will give an error.

I tried changing the user-agent in headers but it does not work.

var client = new MetaInspector(param.url, {timeout: 10000, headers:‘Mozilla/5.0 (Linux; Android 5.1.1; Nexus 5 Build/LMY48B; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/43.0.2357.65 Mobile Safari/537.36’});

suyogdilipkale commented 6 years ago

I figured out solution by passing user-agent as below:

var MetaInspector = require('node-metainspector'); var client = new MetaInspector("https://www.m.webmd.com/vitamins/ai/ingredientmono-483/peanut-oil", { timeout: 5000, headers:{'User-Agent':'Mozilla/5.0 (Linux; Android 5.1.1; Nexus 5 Build/LMY48B; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/43.0.2357.65 Mobile Safari/537.36'} });

client.on("fetch", function(){ console.log("title: " + client.title); console.log("image: " + client.image); });

client.on("error", function(err){ console.log(err); });

client.fetch();