-
SgmlLinkExtractor returns no links for http://metaoptimize.com/qa/
-
`SgmlLinkExtractor` can choke on some pages that lxml is fine with.
- https://groups.google.com/forum/#!topic/scrapy-users/iA1VzcJYpJE
Currently, `LxmlParserLinkExtractor` doesnt have some of `SgmlLi…
-
I'm having an exception when extracting links for a site. It can be reproduced by:
```
$ scrapy shell 'http://www.cnea.gov.ar/'
>>> from scrapy.contrib.linkextractors import sgml
>>> e = sgml.SgmlLin…
-
I often have problems with the `SgmlLinkExtractor`. Lets try:
```
scrapy shell "http://www.dachser.com/de/de/"
# in the shell
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
link_ext…
bijzz updated
10 years ago
-
Here is my spider
``` python
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from vitrinbot.items im…
-
with this pure html page
http://www.freetocharities.org.uk/zimconserve/
-
Con la configuración actual, el rendimiento es muy bajo, sin embargo el uso CPU se dispara al 100% en muy poco tiempo.
A veces el crawler se queda bloqueado, pero el uso de CPU continúa al 100%.
torce updated
10 years ago
-
It is now a string, and `attrs_func` doesn't make sense if `attrs` is a string. See https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/linkextractors/sgml.py#L98
kmike updated
10 years ago
-
This issue has been addressed before by #285 but at the time none of the proposed alternative solution have made it into Scrapy.
Even though the solution proposed in #285 was a good workaround, it wa…
rmax updated
10 years ago
-
```
File "/lib/python2.6/site-packages/Scrapy-0.16.2-py2.6.egg/scrapy/contrib/linkextractors/sgml.py"
line 84, in handle_data
self.current_link.text = self.current_link.text + data.strip()
exceptions…
dzyao updated
11 years ago