dharmafly / noodle

A node server and module which allows for cross-domain page scraping on web documents with JSONP or POST.
https://noodle.dharmafly.com/
747 stars 69 forks source link

Allow for multiple user agents to be specified #91

Open dabeeeenster opened 11 years ago

dabeeeenster commented 11 years ago

Ive found that sites are less likely to block you from scraping them if you vary the user agent you send in your scrape requests. Being able to set 1 agent is good, but being able to set 10 or 20 and have the application choose one at random when requesting a target url would help reduce the chances of being blocked.

ghost commented 8 years ago

You could throw down something like..

`var userAgentList = [

"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36",
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko",
"Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko"

]`

function getRandomUserAgent() { return userAgentList[Math.floor(Math.random()*userAgentList.length)]; }

userAgent : getRandomUserAgent(arguments),