dask / community

For general discussion and community planning. Discussion issues welcome.
19 stars 3 forks source link

Add `committers.yaml` to all repositories #338

Open hendrikmakait opened 10 months ago

hendrikmakait commented 10 months ago

Adding a committers file would make it easier to understand who has commit rights on a given repository (e.g., https://github.com/dask/distributed/pull/7743#issuecomment-1681028790), generally add transparency, and enable further automation akin to CPython's bedevere (https://github.com/python/bedevere/blob/main/README.md).

In its simplest form, this would be a list of GitHub aliases, though I personally like Arrow's model, which adds a bit more information (https://github.com/apache/arrow-site/blob/main/_data/committers.yml):

While affiliations might get out of date, they indicate the project's health. When it comes to roles, I don't know if they would add any benefit, but they might come in handy and should be low maintenance. I'm specifically thinking about "technical" roles like repo owner, not so much about Steering Council membership or anything like that.

What are people's thoughts?

jakirkham commented 10 months ago

Naively would think this might put off some folks that contribute in their free time or as a hobby. They may not want to involve their work in something that occurs in their free time. Though maybe there is a way to adapt this to handle that need.

hendrikmakait commented 10 months ago

affiliation and name could be optional, the question would be if we wanted to include them at all.

jakirkham commented 10 months ago

For sure

Though think the fields included is only part of the point here

IOW we are adding a step that corporate participants may not mind. It is a bit bureaucratic perhaps, but there are already measures like this in corporate settings (signing commits, signing CLAs, license auditing, etc.). They do add (minor) frictions, but that is tangential

The point more is this is a change in the culture of Dask to have more of a corporate focus. Idk if that is intended (or recognized) in this change. If it is, that's ok. Just wanted to prompt a bit of thought around the cultural affect

hendrikmakait commented 10 months ago

Since this seems to amount to a more extensive debate than I had initially anticipated, let me clarify my goals:

Goals

For each repository, there is currently an opaque group of people holding write privileges. I want to make that group of people...

  1. publicly available to members of the Dask organization, contributors, and interested users.
  2. programmatically accessible for automation on Github. The details are beyond this issue, but consider bedeveres PR state machine as the role model.

Why make this information publicly available?

It is an indicator of project health. It tells me:

Since openness and transparency are fundamental foundations of our governance, it simply feels wrong that this information is 100% opaque.

Not having this information publicly available has been a nuisance for me several times when working on PRs. PRs were delayed because no committer had the time to approve them or because it was wrongly assumed that people held write privileges to a repository.

How to achieve these goals?

There are several ways to make this information accessible with varying amounts of transparency:

1. Use visible Github teams

👍 By using Github teams to administer write privileges per repository, this information should be programmatically available to anyone within the Dask organization. 👍 Zero friction; we need to administer write privileges somehow. (Honestly, I'm surprised this is not already being done given that it seems so simple and teams are even mentioned in the governance docs, but that's a different story.) 👎 Even visible teams are only accessible to members of the Dask organization, so this would still not be transparent to external contributors or users.

Example

2. Add a committers file to each repository

👍 The information is publicly available. 👍 Easily accessible for automation. 👍 Little to zero friction for committers. They can add information like your name or affiliations, but they can also choose not to. 👍/👎 A little friction for owners: Owners must add a PR to the repo that updates the file when new write privileges are awarded to keep this in sync. I doubt this will be noticeable on top of the existing workflow of awarding write privileges, but I'm not involved in that.

Example

3. Add a page to the documentation of each project

👍 This will be the most visually appealing presentation 👎 It feels over-engineered for an initial step 👎 This likely adds significant friction 👎 Likely more effort to access programmatically

Example

hendrikmakait commented 10 months ago

The point more is this is a change in the culture of Dask to have more of a corporate focus. Idk if that is intended (or recognized) in this change. If it is, that's ok. Just wanted to prompt a bit of thought around the cultural affect

Any shift toward a corporate focus was not intended. I still fail to recognize where how this change would create a cultural shift if implemented correctly. Could you please elaborate? I am also happy to discuss this during the next maintainers bi-weekly.

hendrikmakait commented 10 months ago

FWIW, pandas combines approach 2 and 3 and uses aliases only in their yaml: https://github.com/pandas-dev/pandas/blob/7c9ba89c8ca8bb0f71a3fd1467b61d515611b361/web/pandas/config.yml#L71C1-L108

I guess one could also combine 1 and 3, but I would generally prefer avoiding 3 in the first step because of the implementation effort.

jakirkham commented 10 months ago

Thanks Hendrik! 🙏

Appreciate the additional clarifications

Think this was a misunderstanding on my part. Sorry about that 😞

Originally had read this as an activity that any contributor to Dask or Distributed would do

Now with a clearer understanding I agree with you that this is reasonable 👍

Also like the use of GitHub teams

Recall a past discussion like this where some folks had reservations with a written list as it might fall out of date, but I can't find it atm. If I do, I'll add it here

Wonder if there is a way to scrape the GitHub team during doc builds and write that out. Or alternatively users added to a doc then get privileges via some automation

hendrikmakait commented 10 months ago

Think this was a misunderstanding on my part. Sorry about that 😞

No worries, I'm glad we reached common ground. :)

Recall a past discussion like this where some folks had reservations with a written list as it might fall out of date, but I can't find it atm. If I do, I'll add it here

I would be interested in that!

Wonder if there is a way to scrape the GitHub team during doc builds and write that out. Or alternatively users added to a doc then get privileges via some automation

There should be a way to do this, but we could hit quotas; maybe a daily CI job would work as an alternative. Anyway, I'd suggest going the manual route first and figuring out a high-tech solution once we see that manual doesn't work for us.

jakirkham commented 10 months ago

Honestly the automation stuff has gotten a lot easier since GHA

We can also do things like check if there was a change before updating (and only make a handful of updates when needed)

There's some logic in conda-smithy that could be borrowed if we use a doc as a source of truth for updating GitHub teams

jacobtomlinson commented 10 months ago

I'm curious if this is solved by keeping the CODEOWNERS files more up-to-date? If the goal is to have a clear list of people who are accountable for review/merging then that's exactly what this is for no?

https://github.com/dask/distributed/issues/7641