edgi-govdata-archiving / wayback

A Python API to the Internet Archive Wayback Machine
https://wayback.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
61 stars 12 forks source link

Support multiple filters in CDX search #127

Closed Mr0grog closed 11 months ago

Mr0grog commented 11 months ago

The filter_field parameter for WaybackClient.search() can now be a list or tuple of strings, letting you add multiple filters. For example, to search for all captures at nasa.gov with a 404 status and “feature” somewhere in the URL:

client.search('nasa.gov/',
              match_type='prefix',
              from_date=date(2022, 1, 1),
              to_date=date(2022, 2, 1),
              filter_field=['statuscode:404',
                            'urlkey:.*feature.*'])

Thanks to @BilibalaX for starting this in #120.