-
Do you have ready to go method to init chrome extension of captcha service and configure it before visiting the page and obtaining page context?
-
Hi,
I'm using scrapy-playwright for data scraping, where URLs are provided through a txt file. I've noticed that every time a URL is scraped, the browser restarts, which significantly reduces scrap…
-
I have been using Playwright with the Scrapy web scraping framework, this is the plugin: https://github.com/scrapy-plugins/scrapy-playwright
Scrapy is designed to cleanly shutdown on SIGINT, saving…
-
It would be great if a plugin like https://github.com/scrapy-plugins/scrapy-playwright did not had to force you to drive all requests through its download handlers, and instead you could drive certain…
-
I am facing an issue when using chromium, when trying to download a PDF file: the response.body is the viewer plugin HTML, not the bytes.
There's already a concerned fix here: https://github.com/s…
-
Currently, the requests coming from `scrapy_zyte_api.providers.ZyteApiProvider` doesn't create the **Parent Request #** field in Scrapy Cloud.
In the example above, Request 1 should have a **Pa…
-
I was going to ask this question on StackOverflow, but I failed because of the chinese internet. So I have to ask this question here. If this is not in compliance, I am sorry about it.
I'm learning…
-
As part of https://github.com/scrapy-plugins/scrapy-splash/pull/269, the `url` parameter to `SplashRequest` is no longer optional.
@elacuesta noticed that this is a backward-incompatible change. Mo…
-
In this case:
```python
class A:
def __init__(self):
pass
class B:
def __init__(self):
super(B, self)
class C(B, A):
pass
```
LGTM reports that `A.__init…
-
I was setuping autoextract in scrapy cloud on a project with crawlera addon. Autoextract queries were routed through crawlera. Idea is to blacklist autoextract domain by default. It may have sense for…