ebmdatalab / metrics

Other
0 stars 0 forks source link

Ingest repos in the opensafely org #187

Open iaindillingham opened 3 months ago

iaindillingham commented 3 months ago

To help us understand how researchers are using Codespaces, it would be useful to derive the distribution of time deltas between when a study repo was created and when an associated Codespace was last used (opensafely-core/codespaces-initiative#68). This distribution would tell us how many researchers are using Codespaces to update code in older study repos; these researchers may need development environments with older versions of Python and R packages (e.g. from python:v1).

Study repos are associated with the opensafely org, but at present, metrics ingests repos in the ebmdatalab and opensafely-core orgs. Consequently, metrics should also ingest repos in the opensafely org.

Making this change is more complicated than adding "opensafely" to _ORGS:

https://github.com/ebmdatalab/metrics/blob/5db873c6b57287c6a0db9fc6fc475bd63581e70e/metrics/github/github.py#L10

This is because metrics assumes that each org has three tech teams:

https://github.com/ebmdatalab/metrics/blob/5db873c6b57287c6a0db9fc6fc475bd63581e70e/metrics/github/github.py#L9

https://github.com/ebmdatalab/metrics/blob/5db873c6b57287c6a0db9fc6fc475bd63581e70e/metrics/github/github.py#L172-L173

The opensafely org doesn't have three tech teams, so this query fails. If it had these teams, then it should succeed, and _repo_owners should return a dict containing all study repos, because members of these teams would be owners of the opensafely org.

Jongmassey commented 3 months ago

I'd be inclined to add

_TEAMS = {
    "ebmdatalab": _TECH_TEAMS,
    "opensafely-core": _TECH_TEAMS,
    "opensafely": ["research"]
}

then modify _repo_owners() thusly:

def _repo_owners(org):
    teams = _TEAMS[org]
    return {repo: team for team in teams for repo in query.team_repos(org, team)} 

leaving other references to _TECH_TEAMS intact as they are in tech-team-specific contexts.

iaindillingham commented 3 months ago

Thanks, @Jongmassey. That's a sound alternative. Before choosing between modifying GitHub and modifying metrics, we should decide whether we want our GitHub orgs to have a similar team structure.

We should also check whether ingesting opensafely repos affects our current dashboards.

Jongmassey commented 3 months ago

We should also check whether ingesting opensafely repos affects our current dashboards.

from memory, changing repo_owners() but leaving tech_repos(), tech_issues(), and tech_prs() untouched should mean the current dashboards aren't affected but certainly worth a proper check when this comes to be implemented.

lucyb commented 3 months ago

That sounds like a good technical solution. However, we've decided not to do this work at the moment as we think we have enough metrics data for the current Codespaces usage. If/when we have more people using Codespaces we should definitely look at implementing this.