Open hendrikmakait opened 10 months ago
Naively would think this might put off some folks that contribute in their free time or as a hobby. They may not want to involve their work in something that occurs in their free time. Though maybe there is a way to adapt this to handle that need.
affiliation
and name
could be optional, the question would be if we wanted to include them at all.
For sure
Though think the fields included is only part of the point here
IOW we are adding a step that corporate participants may not mind. It is a bit bureaucratic perhaps, but there are already measures like this in corporate settings (signing commits, signing CLAs, license auditing, etc.). They do add (minor) frictions, but that is tangential
The point more is this is a change in the culture of Dask to have more of a corporate focus. Idk if that is intended (or recognized) in this change. If it is, that's ok. Just wanted to prompt a bit of thought around the cultural affect
Since this seems to amount to a more extensive debate than I had initially anticipated, let me clarify my goals:
For each repository, there is currently an opaque group of people holding write privileges. I want to make that group of people...
bedevere
s PR state machine as the role model.It is an indicator of project health. It tells me:
Since openness and transparency are fundamental foundations of our governance, it simply feels wrong that this information is 100% opaque.
Not having this information publicly available has been a nuisance for me several times when working on PRs. PRs were delayed because no committer had the time to approve them or because it was wrongly assumed that people held write privileges to a repository.
There are several ways to make this information accessible with varying amounts of transparency:
👍 By using Github teams to administer write privileges per repository, this information should be programmatically available to anyone within the Dask organization. 👍 Zero friction; we need to administer write privileges somehow. (Honestly, I'm surprised this is not already being done given that it seems so simple and teams are even mentioned in the governance docs, but that's a different story.) 👎 Even visible teams are only accessible to members of the Dask organization, so this would still not be transparent to external contributors or users.
👍 The information is publicly available. 👍 Easily accessible for automation. 👍 Little to zero friction for committers. They can add information like your name or affiliations, but they can also choose not to. 👍/👎 A little friction for owners: Owners must add a PR to the repo that updates the file when new write privileges are awarded to keep this in sync. I doubt this will be noticeable on top of the existing workflow of awarding write privileges, but I'm not involved in that.
👍 This will be the most visually appealing presentation 👎 It feels over-engineered for an initial step 👎 This likely adds significant friction 👎 Likely more effort to access programmatically
The point more is this is a change in the culture of Dask to have more of a corporate focus. Idk if that is intended (or recognized) in this change. If it is, that's ok. Just wanted to prompt a bit of thought around the cultural affect
Any shift toward a corporate focus was not intended. I still fail to recognize where how this change would create a cultural shift if implemented correctly. Could you please elaborate? I am also happy to discuss this during the next maintainers bi-weekly.
FWIW, pandas combines approach 2 and 3 and uses aliases only in their yaml: https://github.com/pandas-dev/pandas/blob/7c9ba89c8ca8bb0f71a3fd1467b61d515611b361/web/pandas/config.yml#L71C1-L108
I guess one could also combine 1 and 3, but I would generally prefer avoiding 3 in the first step because of the implementation effort.
Thanks Hendrik! 🙏
Appreciate the additional clarifications
Think this was a misunderstanding on my part. Sorry about that 😞
Originally had read this as an activity that any contributor to Dask or Distributed would do
Now with a clearer understanding I agree with you that this is reasonable 👍
Also like the use of GitHub teams
Recall a past discussion like this where some folks had reservations with a written list as it might fall out of date, but I can't find it atm. If I do, I'll add it here
Wonder if there is a way to scrape the GitHub team during doc builds and write that out. Or alternatively users added to a doc then get privileges via some automation
Think this was a misunderstanding on my part. Sorry about that 😞
No worries, I'm glad we reached common ground. :)
Recall a past discussion like this where some folks had reservations with a written list as it might fall out of date, but I can't find it atm. If I do, I'll add it here
I would be interested in that!
Wonder if there is a way to scrape the GitHub team during doc builds and write that out. Or alternatively users added to a doc then get privileges via some automation
There should be a way to do this, but we could hit quotas; maybe a daily CI job would work as an alternative. Anyway, I'd suggest going the manual route first and figuring out a high-tech solution once we see that manual doesn't work for us.
Honestly the automation stuff has gotten a lot easier since GHA
We can also do things like check if there was a change before updating (and only make a handful of updates when needed)
There's some logic in conda-smithy that could be borrowed if we use a doc as a source of truth for updating GitHub teams
I'm curious if this is solved by keeping the CODEOWNERS
files more up-to-date? If the goal is to have a clear list of people who are accountable for review/merging then that's exactly what this is for no?
Adding a
committers
file would make it easier to understand who has commit rights on a given repository (e.g., https://github.com/dask/distributed/pull/7743#issuecomment-1681028790), generally add transparency, and enable further automation akin to CPython'sbedevere
(https://github.com/python/bedevere/blob/main/README.md).In its simplest form, this would be a list of GitHub aliases, though I personally like Arrow's model, which adds a bit more information (https://github.com/apache/arrow-site/blob/main/_data/committers.yml):
name
alias
role
affiliation
While affiliations might get out of date, they indicate the project's health. When it comes to roles, I don't know if they would add any benefit, but they might come in handy and should be low maintenance. I'm specifically thinking about "technical" roles like repo owner, not so much about Steering Council membership or anything like that.
What are people's thoughts?