calgo-lab / green-db

The monorepo that powers the GreenDB.
https://calgo-lab.github.io/green-db/
22 stars 2 forks source link

Task/optimize otto scrapes disable unavailable for sustainabile products #121

Closed itrajanovska closed 1 year ago

itrajanovska commented 1 year ago

Remove ProductCategory.HEADPHONES; ProductCategory.TV from otto (unnecessary for now); Add the original_URL in the core.domain::ScrapedPage; Update the tests in extract accordingly.

To address https://github.com/calgo-lab/green-db/issues/119 and https://github.com/calgo-lab/green-db/issues/126 - speed up the Otto scraping and avoid assiging the UNAVAILABLE label to products which are sustainable, we propose the following solution.

itrajanovska commented 1 year ago

As an example which I've tried today, here's a comparsion on how many products get stored for a single fashion subcategory, with the new changes regarding the UNAVAILABLE label, and with the old version accordungly.

with changes = 79
old version = 114