bellingcat / wayback-google-analytics

A lightweight tool for scraping current and historic Google Analytics data
https://pypi.org/project/wayback-google-analytics/
MIT License
187 stars 22 forks source link

Add better rate limit protection (solves #19) #20

Closed jclark1913 closed 10 months ago

jclark1913 commented 10 months ago

Overview

This PR addresses issue #19 and attempts to mitigate some of the 443 errors that result from large queries. It adds a semaphore of 10 in main.py that is inherited by functions called thereafter and adds a 5 second delay between each CDX api call when retrieving snapshots.

The program can handle much larger requests now, but there are still persistent issues when asking for very sizeable requests. As a result, there are now warning messages if a query is larger than 10 urls or asking for more than 500 archived snapshots per url.

Changelog

API errors

Readme

New Features

Known issues