calgo-lab / green-db

The monorepo that powers the GreenDB.
https://calgo-lab.github.io/green-db/
22 stars 2 forks source link

Avoid scraping products in Ähnliche Artikel in OTTO #127

Closed itrajanovska closed 1 year ago

itrajanovska commented 1 year ago

After doing a manual inspection on otto's webpage we concluded the following: When there's no pagination i.e the category has only a few products, there's a section in the bottom that appears from OTTO called Ähnliche Artikel. An example can be seen here: https://www.otto.de/schuhe/hausschuhe/?nachhaltigkeit=alle-nachhaltigen-artikel

In this Ähnliche Artikel section OTTO provides products from other categories as well, and most of the time they aren't even sustainable. We should avoid scraping this products because otherwise this results in assigning false categories to the products in Ähnliche Artikel