-
Some suggestions:
1. complete your doc about how to use, please give a example in scrapy;
2. this code have some bugs, eg. [https://github.com/movingheart/django_example/blob/master/QQ%E5%9B%BE%E7%89%…
-
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After
To reproduce:
```$ curl -L -I http://reddit.com```
It should yield a 429 at some point, when trying to hit `https://www.r…
-
I'd like to be able to include the status of whether the response to the URL being scraped used SSL or not. The challenge is that inside the `parse` method of the `SplashResponse` the `response.certif…
-
I would like a way to be able to detect typos on spider settings.
Some approaches that I can think of:
- Making this a core feature of Scrapy, having extensions report the settings they use someho…
-
The page https://focus.pcsb.org/focus/ is supposed to look like this:
![image](https://user-images.githubusercontent.com/57899773/123558105-73b20b80-d7b6-11eb-9da6-fda3e39651b7.png)
But splash is …
-
I got 405 when I ran it. it says: HTTP status code is not handled or not allowed.
Would you mind to take a look? Thanks.
-
Is scrapy-splash not compatible with obeying robots.txt? Everytime I make a query it attempts to download the robots.txt from the docker instance of scrapy-splash. The below is my settings file. I'm t…
-
In _settings.py_ there is _HTTPCACHE_EXPIRATION_SECS = 300 (seconds)_ .
However, it seems to me that _EXPIRATION_ is only at what point in time Scrapy ignores that cached data; With seemingly nothi…
ghost updated
5 years ago
-
It'd be great if the plugin can be configured that it'll use/re-use the sessions mechanism.
Because managing it in spiders like that:
```
if 'X-Crawlera-Session' in response.headers and resp…
-
There are examples of using cookies in the docs, but no examples of setting method and body. I think it would be useful to add it, or perhaps even add the following class (with a better name): with it…