Closed BigDatalex closed 10 months ago
looks like this endpoint lists all available filters https://www.otto.de/leafcutter/filters?rule=(und.(ist.nachhaltigkeit._).(~.(v.1)))&fc=
so SUSTAINABILITY_FILTER = "?nachhaltigkeit=beruecksichtigt-tierwohl,energieeffiziente-nutzung,foerderung-sozialer-initiativen,kreislauffaehiges-design,materialien-aus-biologischem-anbau,naturkosmetik,recycelte-materialien,verbesserte-herstellung,verbesserte-rohstoffbeschaffung"?
Otto does not support the option
?nachhaltigkeit=alle-nachhaltigen-artikel
in URLs anymore to filter for all sustainable products. So our requests get redirected to the standard category with all products, which results in products being scraped that are not sustainable and also a longer scraping time.We need to change the filter variable in here: https://github.com/calgo-lab/green-db/blob/9534d767f3edcfc78a0390949072e01b20be86e2/scraping/scraping/start_scripts/otto_de.py#L6
to all options that are available on otto. These are for example the ones for the blouse category:
nachhaltigkeit=foerderung-sozialer-initiativen,kreislauffaehiges-design,materialien-aus-biologischem-anbau,naturkosmetik,recycelte-materialien,verbesserte-herstellung,verbesserte-rohstoffbeschaffung
but probably there are some additional ones on the other categories. This needs to be investigated. Maybe @en-GB or @AdriaSG can have a look at this?