elacuesta / scrapy-pyppeteer

Pyppeteer integration for Scrapy
BSD 3-Clause "New" or "Revised" License
60 stars 13 forks source link

Retrying request download causes new pages to be opened #7

Open nichoi opened 4 years ago

nichoi commented 4 years ago

Hi, thanks for the useful project.

I have a middleware that retries a request up to X times, each time using a new proxy. This means _download_request is called X times, and as a result, X pages are created.

I am using a forked version of your project to close all pages before opening a new one. Wondering what you think of this solution / would you be interested in a contribution?

Also wondering if it would be possible to reuse the same page? New to pyppeteer.

https://github.com/TeamHG-Memex/scrapy-rotating-proxies

elacuesta commented 4 years ago

Hi, thanks for the interest in this project. Is this an issue of performance, memory usage, or both? Or something else? To be honest I didn't think this would be a problem, pages are relatively short-lived, as they're closed right after response content is read. Closing all pages before opening a new one doesn't sound right from a concurrency standpoint, I'd be more inclined to add a way to request the handler to reuse a certain page. I'd be interested in seeing how you modified the handler, though. Thanks again!