-
I have been using Playwright with the Scrapy web scraping framework, this is the plugin: https://github.com/scrapy-plugins/scrapy-playwright
Scrapy is designed to cleanly shutdown on SIGINT, saving…
-
It would be great if a plugin like https://github.com/scrapy-plugins/scrapy-playwright did not had to force you to drive all requests through its download handlers, and instead you could drive certain…
-
I am facing an issue when using chromium, when trying to download a PDF file: the response.body is the viewer plugin HTML, not the bytes.
There's already a concerned fix here: https://github.com/s…
-
Currently, the requests coming from `scrapy_zyte_api.providers.ZyteApiProvider` doesn't create the **Parent Request #** field in Scrapy Cloud.
In the example above, Request 1 should have a **Pa…
-
I was going to ask this question on StackOverflow, but I failed because of the chinese internet. So I have to ask this question here. If this is not in compliance, I am sorry about it.
I'm learning…
-
As part of https://github.com/scrapy-plugins/scrapy-splash/pull/269, the `url` parameter to `SplashRequest` is no longer optional.
@elacuesta noticed that this is a backward-incompatible change. Mo…
-
In this case:
```python
class A:
def __init__(self):
pass
class B:
def __init__(self):
super(B, self)
class C(B, A):
pass
```
LGTM reports that `A.__init…
-
I was setuping autoextract in scrapy cloud on a project with crawlera addon. Autoextract queries were routed through crawlera. Idea is to blacklist autoextract domain by default. It may have sense for…
-
![image](https://user-images.githubusercontent.com/43572770/100686233-d7a75980-33b8-11eb-8c62-8484a15881eb.png)
Clone issue: https://github.com/scrapy-plugins/scrapy-splash/issues/272
-
* 洋葱网络 tor
* 随机选择一个代理 [scrapy-proxies](https://github.com/aivarsk/scrapy-proxies)
* 一些免费代理 [xiaoer](http://www.xiaoerdaili.com/) [西刺](https://www.xicidaili.com/)
* 收费集成代理 [scrapy-crawlera](https://…