calgo-lab / green-db

The monorepo that powers the GreenDB.
https://calgo-lab.github.io/green-db/
22 stars 2 forks source link

Change `zalando` extractor to use HTML tags instead of `headline` string #21

Closed se-jaeger closed 2 years ago

se-jaeger commented 2 years ago

Because we plan to extend the GreenDB to non-German markets, we should not rely on German strings found on the webpage. See: https://github.com/calgo-lab/green-db/blob/8c1bc10801915e6d02f78592a9d6c201bb95632d/extract/extract/extractors/zalando.py#L112

It's better to use HTML tags like here for example: https://github.com/calgo-lab/green-db/blob/8c1bc10801915e6d02f78592a9d6c201bb95632d/extract/extract/extractors/otto.py#L165

BigDatalex commented 2 years ago

issue fix in branch: green-db/af-issue21