chaoss / augur

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/ and learn more about Augur at our website https://augurlabs.io
https://oss-augur.readthedocs.io/en/main/
MIT License
582 stars 843 forks source link

Fully collect events on large repositories #2887

Open sgoggins opened 1 month ago

sgoggins commented 1 month ago

This employs the strategy we are currently using for messages on larger repositories.

sgoggins commented 1 month ago

There is currently an issue with this one in the dev testing branch. Our code for checking limits reveals them and needs to be tuned.

Traceback (most recent call last):
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/events.py", line 33, in collect_events
    if bulk_events_collection_endpoint_contains_all_data(key_auth, logger, owner, repo):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/events.py", line 60, in bulk_events_collection_endpoint_contains_all_data
    raise Exception(f"Either github raised the paginator page limit for things like events and messages, or is_pagination_limited_by_max_github_pages is being used on a resource that does not have a page limit. Url: {url}")
Exception: Either github raised the paginator page limit for things like events and messages, or is_pagination_limited_by_max_github_pages is being used on a resource that does not have a page limit. Url: https://api.github.com/repos/apache/arrow/issues/events