chaoss / augur

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/ and learn more about Augur at our website https://augurlabs.io
https://oss-augur.readthedocs.io/en/main/
MIT License
586 stars 845 forks source link

Ignore 404 when no messages exist #2871

Closed sgoggins closed 1 week ago

sgoggins commented 2 months ago

When there are no messages associated with an issue or pull request, which is quite rare, we need to keep going, perhaps throw a warning, and continue. Here is an example error response:

Traceback (most recent call last):
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/messages.py", line 43, in collect_github_messages
    process_large_issue_and_pr_message_collection(repo_id, repo_git, logger, manifest.key_auth, task_name, augur_db)
  File "/home/ubuntu/github/augur/augur/tasks/github/messages.py", line 100, in process_large_issue_and_pr_message_collection
    messages = list(github_data_access.paginate_resource(comment_url))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/util/github_data_access.py", line 44, in paginate_resource
    response = self.make_request_with_retries(url)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/util/github_data_access.py", line 119, in make_request_with_retries
    return self.__make_request_with_retries(url, method, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/tenacity/__init__.py", line 330, in wrapped_f
    return self(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/tenacity/__init__.py", line 467, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/tenacity/__init__.py", line 368, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/tenacity/__init__.py", line 390, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/ubuntu/github/virtualenvs/hosted/lib/python3.11/site-packages/tenacity/__init__.py", line 470, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/util/github_data_access.py", line 133, in __make_request_with_retries
    return self.make_request(url, method, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/github/augur/augur/tasks/github/util/github_data_access.py", line 107, in make_request
    raise UrlNotFoundException(f"Could not find {url}")
augur.tasks.github.util.github_data_access.UrlNotFoundException: Could not find https://api.github.com/repos/canonical/ubuntu-frame/issues/182/comments

I have verified in these cases there are, in fact, no messages, and the API returns:

<div class="toolbar"><div class="devtools-separator"></div><div class="devtools-searchbox"><input class="searchBox devtools-filterinput" placeholder="Filter JSON" value=""></div></div><div class="panelContent" id="json-scrolling-panel" tabindex="0">
  |  
-- | --
message | "Not Found"
documentation_url | "https://docs.github.com/…ents#list-issue-comments"
status | "404"

</div>message   "Not Found"
documentation_url   "[https://docs.github.com/…ents#list-issue-comments](https://docs.github.com/rest/issues/comments#list-issue-comments)"
status  "404"
sgoggins commented 2 months ago

exception is being thrown in augur/tasks/github/util/github_data_access.py here:

    def make_request(self, url, method="GET", timeout=100):

        with httpx.Client() as client:

            response = client.request(method=method, url=url, auth=self.key_manager, timeout=timeout, follow_redirects=True)

            if response.status_code in [403, 429]:
                raise RatelimitException(response)

            if response.status_code == 404:
                raise UrlNotFoundException(f"Could not find {url}")

            response.raise_for_status()

            return response

I know we want this to be a somewhat generic function, but this particular issue has a specific meaning with regards to messages, so I think we might need to consider evaluating the result less generically.