edgi-govdata-archiving / web-monitoring-processing

Tools for access, "diff"-ing, and analyzing archived web pages
https://edgi-govdata-archiving.github.io/web-monitoring-processing
GNU General Public License v3.0
20 stars 20 forks source link

Update wayback requirement from ~=0.3.3 to ~=0.4.0 #826

Closed dependabot[bot] closed 1 year ago

dependabot[bot] commented 1 year ago

Updates the requirements on wayback to permit the latest version.

Release notes

Sourced from wayback's releases.

Version 0.4.0

Breaking Changes

This release includes a significant overhaul of parameters for WaybackClient.search.

  • Removed parameters that did nothing, could break search, or that were for internal use only: gzip, showResumeKey, resumeKey, page, pageSize, previous_result.

  • Removed support for extra, arbitrary keyword parameters that could be added to each request to the search API.

  • All parameters now use snake_case. (Previously, parameters that were passed unchanged to the HTTP API used camelCase, while others used snake_case.) The old, non-snake-case names are deprecated, but still work. They’ll be completely removed in v0.5.0.

    • matchTypematch_type
    • fastLatestfast_latest
    • resolveRevisitsresolve_revisits
  • The limit parameter now has a default value. There are very few cases where you should not set a limit (not doing so will typically break pagination), and there is now a default value to help prevent mistakes. We’ve also added documentation to explain how and when to adjust this value, since it is pretty complex. (#65)

  • Expanded the method documentation to explain things in more depth and link to more external references.

While we were at it, we also renamed the datetime parameter of WaybackClient.get_memento to timestamp for consistency with the CdxRecord and Memento classes. The old name still works for now, but it will be fully removed in v0.5.0.

Features

  • Memento.headers is now case-insensitive. The keys of the headers dict are returned with their original case when iterating, but lookups are performed case-insensitively. For example:

    list(memento.headers) == ['Content-Type', 'Date']
    memento.headers['Content-Type'] == memento.headers['content-type']
    

    (#98)

  • There are now built-in, adjustable rate limits for calls to both search() and get_memento(). The default values should keep you from getting temporarily blocked by the Wayback Machine servers, but you can also adjust them when instantiating WaybackSession:

    # Limit get_memento() calls to 2 per second (or one every 0.5 seconds):
    client = WaybackClient(WaybackSession(memento_calls_per_second=2))
    

    These now take a minimum of 0.5 seconds, even if the Wayback Machine

    responds instantly (there's no delay on the first call):

    client.get_memento('http://www.noaa.gov/', timestamp='20180816111911') client.get_memento('http://www.noaa.gov/', timestamp='20180829092926')

    A huge thanks to @​LionSzl for implementing this. (#12)

Fixes & Maintenance

... (truncated)

Commits


Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)