drawrowfly / amazon-product-api

Amazon Scraper. Scrape products from the amazon search result or reviews from the specific product
631 stars 181 forks source link

[Feature-request] Support --ua as a cli argument #11

Closed kkristof200 closed 3 years ago

kkristof200 commented 4 years ago

I've seen that a custom ua can be passed to the constructor, but I can't pass it via cli.

drawrowfly commented 4 years ago

By default when you are scraping data from CLI tool, then user-agent is being randomized to avoid blocking. Setting custom user-agent doesn't make any sense

kkristof200 commented 4 years ago

I've seen the randomUa variable, but it only randomizes part of a chrome version which is not 'random enough' for my use case, I'm using it via CLI from python and have a random UA already in use in my python env, so I've been looking for a way to inject that value in the lib.

kkristof200 commented 4 years ago

I've seen that ua is a parameter in the constructor too, so it should be only added to the exported args list for the cli

drawrowfly commented 4 years ago

What do you mean by "not 'random enough' " ?

Randomizing version is enough to avoid blocking

kkristof200 commented 4 years ago

The thing with the libs random user-agent is that it only changes Chrome version, more specifically it randomizes the Chrome major version between 65-79 and appends the minor version after it.

Problem nr. 1: it only changes the Chrome version. so the 'randomness' is only the chrome major version (15 cases) Problem nr. 2: The minor version is the same in each case, which can be suspect.

If I understand the code correctly these are all the possible outcomes of a random ua:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.4044.113 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.4044.113 Safari/537.36
drawrowfly commented 4 years ago

Never had problem with this settings, and scraper is being used heavily every day by a lot of people. If you make a lots of request and still getting blocked then use proxy

Ok, I will add this to the to do list

kkristof200 commented 4 years ago

Thanks. As for the --ua param, it would only be adding this to the to the bin/cli.js file, right?

ua: {
    default: null,
    type: 'string',
    describe: 'Pass a custom user-agent to use. This helps to prevent request blocking from the amazon side',
},

If that is the case, I can make a fork/pr so you only have to approve/publish it.

drawrowfly commented 3 years ago

--user-agent is available in the latest version