calgo-lab / green-db

The monorepo that powers the GreenDB.
https://calgo-lab.github.io/green-db/
22 stars 2 forks source link

Remove sustainability filter for otto; #112

Closed itrajanovska closed 1 year ago

itrajanovska commented 1 year ago

Add LAPTOP, TABLET, TV and HEADPHONES for otto; Add new 'UNAVAILABLE' label; Add test for unsustainable products.

BigDatalex commented 1 year ago

Actually can we check the next page extraction please, I just encountered an error...

BigDatalex commented 1 year ago

Actually can we check the next page extraction please, I just encountered an error...

2023-01-12 19:02:56 [scrapy_splash.middleware] WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 20, 'error': 'http404', 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:20: http404'}}
2023-01-12 19:02:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.otto.de/technik/smartphone&l=gq&o=116 via http://splash:8050/execute> (referer: None)
2023-01-12 19:02:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.otto.de/technik/smartphone&l=gq&o=116>: HTTP status code is not handled or not allowed
2023-01-12 19:02:56 [scrapy.core.engine] INFO: Closing spider (finished)
itrajanovska commented 1 year ago

Actually can we check the next page extraction please, I just encountered an error...

2023-01-12 19:02:56 [scrapy_splash.middleware] WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 20, 'error': 'http404', 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:20: http404'}}
2023-01-12 19:02:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.otto.de/technik/smartphone&l=gq&o=116 via http://splash:8050/execute> (referer: None)
2023-01-12 19:02:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.otto.de/technik/smartphone&l=gq&o=116>: HTTP status code is not handled or not allowed
2023-01-12 19:02:56 [scrapy.core.engine] INFO: Closing spider (finished)

Actually can we check the next page extraction please, I just encountered an error...

2023-01-12 19:02:56 [scrapy_splash.middleware] WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 20, 'error': 'http404', 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:20: http404'}}
2023-01-12 19:02:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.otto.de/technik/smartphone&l=gq&o=116 via http://splash:8050/execute> (referer: None)
2023-01-12 19:02:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.otto.de/technik/smartphone&l=gq&o=116>: HTTP status code is not handled or not allowed
2023-01-12 19:02:56 [scrapy.core.engine] INFO: Closing spider (finished)

Thanks for noticing this, it was due to the removed filter, I handled that now and tested it locally, so it works for both in a different way

  1. when the filter is added:
    • by using & => &l=gq&o=116... https://www.otto.de/heimtextilien/bettwaesche/?nachhaltigkeit=alle-nachhaltigen-artikel&l=gq&o=117
  2. and when the filter is removed removed
    • by using ? => ?l=gq&o=116... https://www.otto.de/technik/smartphone/?l=gq&o=116