chaoss / augur

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/ and learn more about Augur at our website https://augurlabs.io
https://oss-augur.readthedocs.io/en/main/
MIT License
587 stars 845 forks source link

Github repos get handed to Gitlab worker after restart #1417

Closed jberkus closed 3 years ago

jberkus commented 3 years ago

Please help us help you by filling out the following sections as thoroughly as you can.

Description:

After an augur restart, errors with Github repo data collection get reported as gitlab worker errors. This appears to be because those Github repos have been handed to the gitlab_issue_worker for unknown reasons.

How to reproduce:

  1. Create an augur instance
  2. Add only github repositories
  3. Wait a bit
  4. Shut down and restart Augur
  5. Github data collection failing because ???
  6. Check /augur/logs/augur.err
  7. Check for new data in the tables -- there will be none

There are almost certainly additional conditions required to reproduce this issue but we don't at this point know what they are.

Expected behavior:

After restart, collecting data from GitHub for GitHub repositories should resume.

Log files

This, repeated every 15 seconds for lots of different repos:

2021-08-11 13:15:30,965 [PID: 1280656] augur.routes.broker [task_error() in broker.py:L218] [ERROR]: workers.gitlab_issues_worker.47921 ran into error while completing task: {'job_type': 'MAINTAIN', 'models': ['gitlab_issues'], 'display_name': 'gitlab_issues model for url: https://github.com/konveyor/tackle-ui-tests.git', 'given': {'git_url': 'https://github.com/konveyor/tackle-ui-tests.git'}, 'focused_task': 1, 'worker_id': 'workers.gitlab_issues_worker.47921'}

2021-08-11 13:15:45,982 [PID: 1280667] augur.routes.broker [task_error() in broker.py:L218] [ERROR]: workers.gitlab_issues_worker.47921 ran into error while completing task: {'job_type': 'MAINTAIN', 'models': ['gitlab_issues'], 'display_name': 'gitlab_issues model for url: https://github.com/konveyor/tackle-controls.git', 'given': {'git_url': 'https://github.com/konveyor/tackle-controls.git'}, 'focused_task': 1, 'worker_id': 'workers.gitlab_issues_worker.47921'}

2021-08-11 13:16:01,068 [PID: 1280646] augur.routes.broker [task_error() in broker.py:L218] [ERROR]: workers.gitlab_issues_worker.47921 ran into error while completing task: {'job_type': 'MAINTAIN', 'models': ['gitlab_issues'], 'display_name': 'gitlab_issues model for url: https://github.com/konveyor/forklift-documentation.git', 'given': {'git_url': 'https://github.com/konveyor/forklift-documentation.git'}, 'focused_task': 1, 'worker_id': 'workers.gitlab_issues_worker.47921'}

Note that every one of those is a GitHub repository.

Checking the gitlab_issue_worker logs shows this:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column "gl_username" does not exist
LINE 2:             SELECT gl_username, cntrb_id FROM contributors 
                           ^

[SQL: 
            SELECT gl_username, cntrb_id FROM contributors 
        ]
(Background on this error at: http://sqlalche.me/e/13/f405)

2021-08-11 14:19:10,996 [PID: 2108257] workers.gitlab_issues_worker.47921 [register_task_failure() in worker_base.py:L490] [ERROR]: Recorded job error in the history table for: {'job_type': 'MAINTAIN', 'models': ['gitlab_issues'], 'display_name': 'gitlab_issues model for url: https://github.com/konveyor/move2kube-ui.git', 'given': {'git_url': 'https://github.com/konveyor/move2kube-ui.git'}, 'focused_task': 1, 'worker_id': 'workers.gitlab_issues_worker.47921'}

... which shows us a data model error which I probably need to run a migration for, but it doesn't explain why the gitlab worker is handling this repo in the first place. Note that in issue #1349 I previously encountered this behavior under 0.17.0.

Software versions:

sgoggins commented 3 years ago

Its sooooo simple. We just ignore github.com in the gitlab workers. I'll drop a patch in for this right away.

Sidebar: I almost made this very simply fix for a Pandas UNICODE patch we released today. The only thing more exciting than finding a weird bug (with NO ERROR) was discovering the LONG series of rants from other developers going back and forth the pandas team re: UNICODE, this error, and the appearance of everything being ok ... until you looked at your data. :D

jberkus commented 3 years ago

OK, but there's the companion fact that the Github workers aren't collecting any data. Is that an unrelated issue?

sgoggins commented 3 years ago

@jberkus : This issue is resolved in our latest release I believe. This is the Unicode/Panda's very strange issue with Panda's to_sql() method and Unicode.

jberkus commented 3 years ago

I'm not sure this issue is fixed entirely. While the GH workers have restated, and are updating pull requests and contributors, we are still not getting any commit information. The commits may be a different issue, though, so opening that separately.

Regardless, I'm still seeing Gitlab errors in the log for Github repos:

2021-08-16 17:25:07,924 [PID: 2234862] augur.routes.broker [task_error() in broker.py:L218] [ERROR]: workers.gitlab_issues_worker.47921 ran into error while completing task: {'job_type': 'MAINTAIN', 'models': ['gitlab_issues'], 'display_name': 'gitlab_issues model for url: https://github.com/konveyor/move2kube-operator.git', 'given': {'git_url': 'https://github.com/konveyor/move2kube-operator.git'}, 'focused_task': 1, 'worker_id': 'workers.gitlab_issues_worker.47921'}

2021-08-16 17:25:23,019 [PID: 2234862] augur.routes.broker [task_error() in broker.py:L218] [ERROR]: workers.gitlab_issues_worker.47921 ran into error while completing task: {'job_type': 'MAINTAIN', 'models': ['gitlab_issues'], 'display_name': 'gitlab_issues model for url: https://github.com/konveyor/move2kube-demos.git', 'given': {'git_url': 'https://github.com/konveyor/move2kube-demos.git'}, 'focused_task': 1, 'worker_id': 'workers.gitlab_issues_worker.47921'}
jberkus commented 3 years ago

This appears to be resolved with a rebuild of 20.2