drawrowfly / amazon-product-api

Amazon Scraper. Scrape products from the amazon search result or reviews from the specific product
647 stars 187 forks source link

Proxies, timeout, amount of results returned, rp #38

Open benjaminvanrenterghem opened 3 years ago

benjaminvanrenterghem commented 3 years ago

Hi there! First off, very nice tool you have built. I have noticed some things while trying to integrate your repo which may be of use to you. I didn't check any code to see what was causing it but merely noticed some behavior.

I am using the module, not the argument parser to execute queries.

1- Proxies and timeout For some reason the timeout does nothing and the program seems to stall well past the set timeout. I am using free proxies scavenged from the internet so it is important that this works. Even when you're using private proxies you would still expect this configuration to work.

Perhaps this can be considered https://medium.com/javascript-in-plain-english/use-promise-race-to-timeout-promises-6710cb0a3164

2- .products() amount of results returned I am using the following options:

argv['keyword'] = keyword;
argv['category'] = 'aps';
argv['country'] = 'US';
argv['number'] = conf.scrape_products_per_keyword; // set to 500
argv['bulk'] = true;
argv['proxy'] = proxies_formatted; // a list of proxies
argv['rating'] = [conf.product_min_rating, conf.product_max_rating]; // [3,5]
argv['sort'] = true;
argv['randomUa'] = true;
argv['timeout'] = conf.request_timeout_ms;  // 1000

As you can see this has bulk=true, however it consistently returns only results from just one page (<50). I resolved this by turning bulk off and manually iterating over the pages, perhaps this is due to the bad quality of the proxies, but shouldn't happen none the less.

3- request-promise Request promise has been deprecated.