-
I am deploying my Portia spider in scrapyd. I have given a pattern to be followed in Crawling section in Portia.
While deploying the spider, links are not following the link pattern which I have give…
-
For my use case I need to store the title of a parent with the child document The references and titles of multiple items are in XML file, so it needs to be split.
```
Test
http://www.test.com/tes…
-
I get this error when trying to compile the samples.
You probably forgot to check something into github
Happens with maven and gradle.
[ERROR] /home/tsweets/projects/spring-restdocs/rest-notes-sprin…
-
How to get around the site until it has links?
Can show a simple example?
-
I have not investigated this yet:
```
(diffeo)stav@platu:~/Workspace/sh/Diffeo/diffeo-netsec$ scrapy crawl blackhat
/home/stav/.virtualenvs/diffeo/src/scrapy/scrapy/contrib/linkextractors/sgml.py:107…
-
I'm going to try and figure this out and maybe fork and merge, but what do you think about having an argument to use a forge server? Or should that be a totally different image?
-
It seems like this is a feature. Give it -u http://a.example.com and if there is link to http://b.example.com then xsscrapy follows and tests it. But IMO that is a big mistake (as a default setting)…
-
I think about suggestion to improve scrapy Selector. I've seen this construction in many projects:
```
result = sel.xpath('//div/text()').extract()[0]
```
And what about `if result:` and `else:`, or…
-
There is either a bug or some stray code in https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/linkextractors/lxmlhtml.py#L37: `tag = _nons(el.tag)` local variable is not used, and so `_nons`…
kmike updated
10 years ago
-
We want to had pluggable link extractor backends, maybe having a `LINKEXTRACTOR_CLASS` setting.
Some backends that come to mind: pure-regex, scrapely, libxml2, lxml, sgml
The sgml backend is not wor…