-
- Jetty tends to add `;jsessionid=md5likestring` to the local part of the url
- OS Commerce adds `osCsid=md5likestring` to the query string in the url
I'm sure there are other popular ones.
There …
-
Sitemap spider fails with urls starting with double slash:
raise ValueError('Missing scheme in request url: %s' % self._url)
exceptions.ValueError: Missing scheme in request url: //www.example.co…
-
I have a strange error. As documentation say, I've done something like this to add a Link in an entity:
``` php
public function onBootstrap(MvcEvent $e)
{
$app = $e->getTarget();
…
-
### Description
Scrapy fails to crawl [emoji domains](https://en.wikipedia.org/wiki/Emoji_domain). Specifically, [i❤.ws](https://xn--i-7iq.ws/)
Raises the following:
`idna.core.InvalidCod…
-
An issue raised, in development of a generic crawler which was supposed to follow particular rules for extracting and visiting links as well as collecting some statistics about visited page.
As websit…
-
Currently Scrapy can't extract links from http://scrapy.org/ page correctly because urls in page header are relative to a non-existing parent: `../download/`, `../doc/`, etc. Browsers resolve these li…
kmike updated
4 years ago
-
I have a strange error. As documentation say, I've done something like this to add a Link in an entity:
``` php
public function onBootstrap(MvcEvent $e)
{
$app = $e->getTarget();
…
-
```python
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy_splash import SplashRequest
class QuotesSpider(scrapy.Spider):
name = "quotes"
allowed_domains =…
ghost updated
5 years ago
-
The most recent run of the petsathome_gb spider from 2023-05-15 has returned 50 fewer stores than the previous run from 2023-04-15. I've checked a few of the missing stores, and they all appear to sti…
-
http://www.zuzhirenshi.com/dianzibao/2022-08-26/1/index.htm