Closed en-GB closed 1 year ago
With the latest changes from #79 the extractor can't find any sustainability labels leading to not create products.
I just updated the zalando extractor. It was just a minor change, due to a change of a class name in the html. See: https://github.com/calgo-lab/green-db/commit/53601e8d5c9908e63c04559bdd5d5bc806753471
There are two commits from @en-GB that might be more robust and improve the extraction of the zalando sustainability-labels see:
We (@en-GB) should check if these behave the same (extract the same sustainability-labels) like in the original approach or if there are some implications. So far, for our zalando tests, these achieve the same results.
@en-GB what's the status about this one? Ist this still an issue or can we just close it? Especially after the lates changes #83
We used to extract labels directly from the rendered HTML. Since Splash is no longer able to render zalando product pages, we extract them from this json file https://github.com/calgo-lab/green-db/blob/bf77115617dc68cc91ba6c2cfdb3a79588ec0e26/extract/extract/extractors/zalando.py#L152 but occasionally some labels will be missing. This only affects ~10 products in any given run and ive only seen it happen on zalando.co.uk.
Switching the zalando scraper to Playwright would probably fix it tho.