The scraping run on '2022-08-12' yielded about 18k UNKNOWNcertificates for Zalando and about 700 for Otto. For Amazon numbers see #88 This PR focuses on decreasing this number and adding a few more improvements. In detail the following things are covered:
Update amazon, asos, zalando and otto labels. The shops mostly changed the naming of already existing labels, but added also a few new ones.
Add logging function as part of sustainability_labels_to_certificates, which logs unknown certificates. So far we logged only those for zalando. Now amazon and otto are also covered.
Products without any label are no more assigned to UNKNOWN, instead they are excluded.
Already added extraction of the (new) EU Energy label for amazon in preparation of new household categories of #83
There were some issues with the amazon description extraction if an empty string was retrieved for productDescription, but there were actually feature-bullets. This is fixed now, by using the feature-bullets instead and if both (feature-bullets and productDescription) are available they are concatenated as suggested in #81
The scraping run on '2022-08-12' yielded about 18k
UNKNOWN
certificates for Zalando and about 700 for Otto. For Amazon numbers see #88 This PR focuses on decreasing this number and adding a few more improvements. In detail the following things are covered:sustainability_labels_to_certificates
, which logs unknown certificates. So far we logged only those for zalando. Now amazon and otto are also covered.productDescription
, but there were actuallyfeature-bullets
. This is fixed now, by using the feature-bullets instead and if both (feature-bullets and productDescription) are available they are concatenated as suggested in #81