calgo-lab / green-db

The monorepo that powers the GreenDB.
https://calgo-lab.github.io/green-db/
22 stars 2 forks source link

Amazon abort scraping of alternate products #109

Closed BigDatalex closed 1 year ago

BigDatalex commented 1 year ago

Amazon sometimes does not return Climate Pledge Friendly (CPF) products, even though we query for those only. They return alternate (non-CPF) products in that situation, which we are not interested in.

For example: Amazon Womens Bathrobe original url: https://www.amazon.de/s?bbn=1981827031&rh=n%3A1981827031%2Cp_n_cpf_eligible%3A22579885031 leads to this: https://www.amazon.de/b?node=1983823031

Or amazon does redirect to CPF page, but returns non-CPF products anyhow, for example: Amazon Dryer original url: https://www.amazon.co.uk/s?bbn=1391019031&rh=n%3A1391019031%2Cp_n_cpf_eligible%3A22579929031 is forwarded to: https://www.amazon.co.uk/s?keywords=Dryers&bbn=1391019031&rh=n%3A1391019031%2Cp_n_cpf_eligible%3A22579929031&c=ts&ts_id=1391019031

But the forwarded page says: "Showing results from All Departments. No results for Dryers in Large Appliances."

This PR aborts the Scraping of those pages and some other adjustments.