Open duckduckgrayduck opened 3 months ago
That is coming from S3 directly - I believe the rate limits for S3 are not concrete, and they rate limit you as they see fit. We could put some exponential backoff into the python library.
It's doing this one page at a time instead of getting all the text at once: https://github.com/MuckRock/documentcloud-regex-addon/blob/main/main.py#L34
i'm receiving the following: documentcloud.exceptions.APIError: 503 - <?xml version="1.0" encoding="UTF-8"?>
SlowDown
https://github.com/MuckRock/documentcloud-regex-addon/actions/runs/9066429901/job/24909314442
when using page_text = document.get_page_text(page_number) in the Regex Extractor Add-On