-
**Descrição/Objetivo:**
Implementar um web scraper utilizando a biblioteca Scrapy para extrair os arquivos PDF do Diário Oficial do Distrito Federal (DODF) e convertê-los para arquivos de texto (TXT)…
-
# -*- coding: utf-8 -*-
import scrapy
class FarmersSpider(scrapy.Spider):
name = 'farmers'
allowed_domains = ['www.farmerscompress.com']
login_url = 'https://www.farmerscompress.com/Proces…
ghost updated
4 months ago
-
when i follow tutorial in the document, run 'scrapy crawl quotes', but AttributeError appears.
2024-03-18 23:01:30 [scrapy.core.scraper] ERROR: Error downloading
Traceback (most recent call las…
-
Inspired by #1054
Similar to the Sample pipeline, we can maybe force the spider to stop once a threshold is reached of, let's say, 5 duplicates of the same item. The Kingfisher extension should ch…
-
In the current version of scrapy, the code below breaks due to spider start time being timezone aware
https://github.com/scrapinghub/spidermon/blob/master/spidermon/contrib/scrapy/monitors/monitors.p…
-
### Description
Initialising some asyncio based library resources (clients/connections) wrapped in asyncio.ensure_future works fine in spider_opened method. But execution of async functions (closin…
-
### Description
I am getting an error from running the demo example. `AttributeError: 'Decompressor' object has no attribute 'process'`. It. looks like it is an internal library to `scrapy`.
##…
-
Finish it before 3/20/2020
-
When:
- `http_proxy` is set for `HttpProxyMiddleware`,
- and an `http://` request is redirected to an `https://` location,
scrapy will use the `http_proxy` settings for the `https` scheme.
This als…
-
### Description
when I issue the following command
`scrapy startproject webcam`
I get the error message:
```
vagrant@vagrant:/vagrant$ scrapy startproject webcam
Traceback (most rece…