calgo-lab / green-db

The monorepo that powers the GreenDB.
https://calgo-lab.github.io/green-db/
22 stars 2 forks source link

randomize the order of categories/products to scrape #92

Open BigDatalex opened 2 years ago

BigDatalex commented 2 years ago

The amazon scraper sometimes gets blocked and retrieves just very few products. I suggest adding some randomization in which order the categories of a merchant are accessed so that every time we scrape we start with different categories. This would help to increase the number of unique products, regardless of being blocked at some point.

In addition, we could also add some randomization to the order in which we access the products of a SERP page. Maybe both of these suggestions would also help to decrease the chance of being detected as a bot.