This tool scrapes the official COVID-19 Puerto Rico dashboard every hour and keeps tracks of changing metrics in order to help visualize and measure progress.
The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items.
It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS).
The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule
The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions.
Fixed feed exports <topics-feed-exports> overwrite support (4845, 4857, 4859)
Fixed the AsyncIO event loop handling, which could make code hang (4855, 4872)
Fixed the IPv6-capable DNS resolver ~scrapy.resolver.CachingHostnameResolver for download handlers that call reactor.resolve <twisted.internet.interfaces.IReactorCore.resolve> (4802, 4803)
Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (4874)
Migrated Windows CI from Azure Pipelines to GitHub Actions (4869, 4876)
Scrapy 2.4.0 (2020-10-11)
Highlights:
Python 3.5 support has been dropped.
The file_path method of media pipelines <topics-media-pipeline> can now access the source item <topics-items>.
This allows you to set a download file path based on item data.
The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes
<topics-exporters>
You can now choose whether feed exports <topics-feed-exports> overwrite or append to the output file.
For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file.
Zstd-compressed responses are now supported if zstandard is installed.
In settings, where the import path of a class is required, it is now possible to pass a class object instead.
Modified requirements
Python 3.6 or greater is now required; support for Python 3.5 has been dropped
As a result:
When using PyPy, PyPy 7.2.0 or greater is now required
<faq-python-versions>
For Amazon S3 storage support in feed exports
<topics-feed-storage-s3> or media pipelines
<media-pipelines-s3>, botocore 1.4.87 or greater is now required
To use the images pipeline <images-pipeline>, Pillow 4.0.0 or greater is now required
(4718, 4732, 4733, 4742, 4743, 4764)
Backward-incompatible changes
~scrapy.downloadermiddlewares.cookies.CookiesMiddleware once again discards cookies defined in Request.headers
<scrapy.http.Request.headers>.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language
- `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme
Additionally, you can set the following in your Dependabot [dashboard](https://app.dependabot.com):
- Update frequency (including time of day and day of week)
- Pull request limits (per update run and/or open at any time)
- Out-of-range updates (receive only lockfile updates, if desired)
- Security updates (receive only security updates, if desired)
Bumps scrapy from 2.2.0 to 2.4.1.
Release notes
Sourced from scrapy's releases.
Changelog
Sourced from scrapy's changelog.
Commits
26836c4
Bump version: 2.4.0 → 2.4.115d301e
Cover Scrapy 2.4.1 in the release notes (#4884)c20b342
Remove unnecessary pytest-azurepipelines package (#4876)91a8451
Merge pull request #4874 from stummjr/fix-missing-fstring-prefix-genspider27b07c6
Merge pull request #4872 from elacuesta/asyncio_get_event_loopb20cfef
Remove unnecessary line from test7e98a76
Use deferred_from_coro in asyncio testa2c4a7f
Add missing f-string prefix to genspider output114229e
Docs: add a note about asyncio.set_event_loop3095d39
Test: disable asyncio reactor on Windows for Py>=3.8Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language - `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme Additionally, you can set the following in your Dependabot [dashboard](https://app.dependabot.com): - Update frequency (including time of day and day of week) - Pull request limits (per update run and/or open at any time) - Out-of-range updates (receive only lockfile updates, if desired) - Security updates (receive only security updates, if desired)