calgo-lab / green-db

The monorepo that powers the GreenDB.
https://calgo-lab.github.io/green-db/
22 stars 2 forks source link

fix zalando pagination #45

Closed en-GB closed 2 years ago

en-GB commented 2 years ago

as discussed in the sync. since then alex confirmed that this is a problem

en-GB commented 2 years ago

no. scrapy allready filters duplicate requests automatically. but i do agree that this is ugly. .getall()[-1] would let me keep the if else structure. it would also cut the number of filtered requests down to just 1 on the final page. would that be acceptable?

se-jaeger commented 2 years ago

To be honest, I'm not 100% sure how scrapy handles duplicated requests. We do request the same URLs multiple times (currently once a week), which obviously works and is important to us. I would prefer to avoid duplicate requests whenever possible and not rely on a feature I don't fully understand.

In this case it easy to omit the duplicated request at the end:

en-GB commented 2 years ago

thats fair. parsing the url isnt necessary tho. i think an explicit parameter is_first_page is better. alternatively i could parse the pagination bar properly.

se-jaeger commented 2 years ago

Absolutely agree! That's much better than parsing the URL