-
## Background
Following the acceptance of https://github.com/scrapinghub/web-poet/pull/27, developers could now use URL patterns to declare which Page Objects would work on specific URL patterns ([…
-
This is more of a question than a feature request, but I guess I can translate it to a request for an enhancement of the documentation.
This is a question I posted on [StackOverflow](https://stacko…
-
I am trying to run multiple spiders with rdbms backend, the spiders are such that they might find the url that was visited by other spider, frontera raises as exception in this case, Is the expected b…
-
Using Scrapy 1.5.0
I took a look at the FAQ section and nothing was relevant about it.
Same for issues with keyword `KeyError` on github, Reddit, or GoogleGroups.
As you can see below, it seems t…
-
Hi!
I couldn't find a way to get a `timedelta` from a string like `3 hours ago` rather than a `datetime`.
The use case is: I have a column `when` with values like `3 hours ago` and a `timestamp` wit…
-
Hi,
Thanks for the wonderful work on Spalsh
I just wanted to know if there is any way to disable browser caching of files?
Or maybe return all HTTP requests made in har/log/entries, not just the ones…
-
Hi team,
While getting the page source of this [url](http://www.suedfargesa.com) 'http://www.suedfargesa.com', I can't get the perfect one.
Here, The script tag contains "window.location.href='ht…
-
Alain Quenneville seen the following exception (running python 2.7.3 on Linux Ubuntu 12.04.3 LTS):
```
Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most r…
-
Hey, I'm currently using splash via Docker and I'm having my container "randomly" die with an exit code of 137. The only relevant message I can see in the log output is the last line:
```
The X11 co…
-
The docs at https://splash.readthedocs.io/en/stable/api.html#request-filters say
> Only related resources are filtered out by request filters; ‘main’ page loading request can’t be blocked this way…