RFC: Automate or abandon forks

zephraph commented 4 years ago

Proposal:

Completely abandon fork workflows in favor of branches

OR

Automate privileged forks to reduce friction in those workflows

Reasoning

We've had a long, colorful history with forks. We've got a smattering of projects that disallow fork PRs via peril. We have had (still have?) github tokens for commenting exposed to the public intentionally because danger ran some important tasks that we wanted to make sure there were comments posted on the PR for.

The friction is that we can't (currently) share privileged information (tokens) with forks and thus they break a lot of tooling/automation. This is something we can surmount technically though. We could have automation that listens for new opened fork PRs and automatically creates a tag or branch that's pushed upstream. With certain naming conventions, either of those strategies could directly report status directly back to the fork branch. It'll take some iteration to get right, but this is automation we could do once to allow folks to work however they want.

We could also just stop using forks entirely. That'll change how some people work, but it'll completely eliminate the need for any special tooling.

So those are (as I see it) our options. I'd welcome other options if I missed them.

How is this RFC resolved?

We decide on an approach and take action towards it.

Resolution

Majority support to this point has been for removing forks from our workflow. To that end I'll mark this RFC as resolved and begin the following actions:

PR an update to our peril automation to remind folks to use branches over forks. We can merge this when we feel ready to fully proceed or incrementally roll it out.
Experiment with protected branches in a low traffic repo to ensure it's compatible with our current automation
Slowly rollout the process to our tier 1 projects to ensure there's plenty of time for feedback about the process.

Thanks for participating folks. If you have any more thoughts/comments, feel free to drop them below.

jonallured commented 4 years ago

and thus they break a lot of tooling/automation

I could use some further explanation on this point - what breaks?

zephraph commented 4 years ago

Anything that needs tokens or credentials to run. Deploying canaries, commenting on PRs, doing stuff in slack, etc.

dblandin commented 4 years ago

@zephraph Thanks for this RFC :100:

I'm in favor of Artsy GitHub organization members using branches within the organization's repositories instead of forks.

For external contributors to our open-source projects, I think any privileged tasks could skip or degrade so that the core function of building and running tests continues for fork PRs.

A nice-to-have would be to post some message if we detect a PR running from a fork and the author is in the Artsy GitHub organization. It's difficult to roll out process changes like this. A little nudge when appropriate and acceptanting gradual adoption might be easier than a more aggressive mandate.

ashfurrow commented 4 years ago

We have had (still have?) github tokens for commenting exposed to the public intentionally ...

Yeah, this is not ideal but Danger's docs touch on this, and we have used the suggested workaround in the past. Let me know what I can clarify here – the @ArtsyOpenSource credentials are in 1Password and I think it might still be configured for a few repos' Danger installs.

We could also just stop using forks entirely.

Sounds great! It would simplify OSS repo workflows and improve security (since we no longer need to configure CI to pass secrets to fork PR builds).

For external contributors to our open-source projects, I think any privileged tasks could skip or degrade so that the core function of building and running tests continues for fork PRs.

For OSS PRs, I typically examine the commits and then pull them into a branch PR to trigger the CI run. It's a bit of a workaround but it works well and OSS PRs requiring CI runs don't happen too often. Here someone opened a PR, and here is where I created a branch PR to run the tests.

A nice-to-have would be to post some message if we detect a PR running from a fork and the author is in the Artsy GitHub organization

This would be a great idea – it would be trivial to extend the existing Peril rule.

ashfurrow commented 4 years ago

FYI, dB touched on the reasoning Artsy used forks in the first place in this blog post: https://artsy.github.io/blog/2012/01/29/how-art-dot-sy-uses-github-to-build-art-dot-sy/

damassi commented 4 years ago

As a cautionary tale, I noticed a dev accidentally commit directly to Gravity master just a few days ago. If we were to abandon forks en masse a first step would be to audit and lock down all of our repos so that kind of thing can't occur. I think a lot of us here heavily rewrite our git history through rebases and so on.

zephraph commented 4 years ago

Circleci does have an article about how they’ve handled this: https://circleci.com/blog/triggering-trusted-ci-jobs-on-untrusted-forks

They’ve also said that they use https://bors.tech.

The technical solution here is still a little fuzzy to me.

anandaroop commented 4 years ago

I'm fine with moving to a branch approach, with one caveat.

One benefit of the fork approach is that the upstream repo does not become cluttered with developers' stale feature branches. If we do adopt the branch approach, I'd love for us to figure out a solution for that.

Currently GH will delete a branch upon a PR being merged, but non-merged branches hang around forever. GH will mark them as stale after 3 months but take no further action because it can't make any assumptions about what that inactivity means.

Maybe we can automate deletion of stale branches after some threshold (whether 3 months or greater, and subject to some allowlisting for long-lived branches like master and release)?

dleve123 commented 4 years ago

👍 on this RFC

As a cautionary tale, I noticed a dev accidentally commit directly to Gravity master just a few days ago. If we were to abandon forks en masse a first step would be to audit and lock down all of our repos so that kind of thing can't occur. I think a lot of us here heavily rewrite our git history through rebases and so on.

Luckily, protected branches can solve this problem, so while this will require some configuration (and testing), I think this is a solvable problem!

I think there's a lot of conversation here about the technical reasons to switch to branches over forks. I'll add another one:

About to ship automation in Force will allow devs to build review apps by pushing to certain branches.

It's critical that this only runs on artsy/force. As such, it'll be a bit annoying to have a workflow where feature work occurs at (e.g.) dleve123/review-app-feature-work, but needs to be mirrored at artsy/review-app-feature-work. Not the end of the word, but food for thought.

With that said, I think there are many non-technical drawbacks of our forking based approach:

Business/Legal: Forks are dubious from an intellectual property perspective.
Dev Culture: I think forks discourages, from a systems-perspective, collective code ownership and collaboration. If I'm pairing with someone, the fact that we have to choose which fork to push to, IMO, suggests slightly more ownership by 1 person. Some (most) work does have a clear owner, but our tooling decisions should be flexible enough here to reflect the disparate (and valuable) ways we work.
Dev Process: It raises the bar to git-ing well: The mere existence of another upstream present in a git workflow, I think, increases the cognitive complexity (and fear) of rebasing, ensuring that the master you are branching off of it up-to-date, etc. git-ing well is hard enough already!

I think forking provides yet another digital folder and means to organize things – I think it's easy to cut yourself on the folder - the cons outway the pros to me.

Year-old Slack thread on this topic: https://artsy.slack.com/archives/C03J4L2KK/p1552490693081200

bhoggard commented 4 years ago

I definitely prefer branches with protected branches over trying to automate forks. The latter doesn't feel like a good use of developer time. Also 👍 to clearing out old branches automatically after 3 months.

joeyAghion commented 4 years ago

I'm attached to the forking model, but happy to try out something different if the tooling and safeguards are that much better.

Personally, I don't think forks raise the bar to git usage unnecessarily. Understanding git's distributed model is so helpful that I consider it more of a baseline expectation for effective contribution. It actually mitigates the risk or fall-out from any one contributor's messy branch practices or rebasing/force-pushing. One fear I have is that our shared repo will end up as messy as the least-git-disciplined contributor's does today.

My intuition is against depending on a lot of automation or tooling to mirror or sync forked PRs to an org repo. It seems like that could get complex and overly-custom quickly, and be misaligned with most "standard" tooling.

zephraph commented 4 years ago

@joeyAghion would your suggestion be to keep the status quo? @jonallured also curious to hear your thoughts.

jonallured commented 4 years ago

I'm torn and don't have super strong feelings tbh. I guess one question I have is whether we need to have a policy here that covers all projects. Like, if forks are causing issues in MP and Reaction or whatever, then maybe we should have a different approach there but does that mean we want to abandon forks everywhere? I'm not as close to this stuff as you are @zephraph so ultimately I feel like deferring to the people in the trenches is the right approach and I trust your judgement. ❤️ 🤗

zephraph commented 4 years ago

That's a great point. My explicit intention in this RFC isn't to advocate one solution over the other but hopefully help the team converge around a single solution.

You're absolutely right in that we can have different approaches in different repositories. That is, to a large extent, how we function today. In repos where we have automation that needs authentication we have two primary mechanisms of dealing with that.

Simply not running CI on forks and having automated notifications via Peril that forks aren't supported. This is the defacto in Eigen for example. (@ashfurrow, feel free to correct me if I'm misspeaking).
Gracefully degrade for a fork and don't run the auth'd automation but run everything else that we can.

While these two solutions have provided us with a "good enough" environment, it still belies major complexity and workflow friction.

If you're used to a fork workflow and open a PR on a repo that doesn't support forks it can be a bit of a jarring/frustrating experience to have the repo reject your PR because it doesn't support forks. It's fine, you can just close that PR and open a new one from a branch, but it is a bit of added friction.

For 2, the same underlying scenario exists if you need the auth'd automation. You open a PR, the thing you need doesn't run, then you have to close the PR and open a new one from a branch. Assuming you don't (the automation is nice-to-have and not need-to-have) someone had to go through and engineering that CI robustly to not fail for your usecase. While that's fine for the most part there's a not-insignificant complexity cost to adding all the branching logic of how to gracefully handle those failures.

I see an overall trend of us as a team trending towards more consolidated technical strategies and workflows. While individual workflows will always vary, any consistency that we can add (while trying to account for or reduce friction) increases our ability to collaborate more efficiently and effectively.

A general observation from this conversation... collectively we don't consistently use forks or branches. We do both, depending on the individual contributor.

To @anandaroop's point... there are 49 branches on reaction currently. ~36 of those branches are stale. Upstream repos are already getting cluttered with stale branches. Occasionally one of us will go through and delete old branches. That'd be a great thing to have automation around. It's also indeed something that wouldn't be an issue if we were all on forks.

I think all of the concerns raised with the branching model we already face today. Because not everyone uses forks. Likewise, forks are generally treated as second class citizens when it comes to CI.

I think we're largely experiencing the worst of all these tradeoffs because we're not owning our solution, but rather letting it emerge from team preference.

In general, my position is this

I'm fine with forks. I'm fine with branches. As a team, I'd like to see us consolidate on an approach and invest in making it solid. No approach will be perfect, but if we consolidate and work together we can make it a great experience.

Final entry to my monologue. Consolidation might not be the right approach either. I could be thinking about this much too rigidly. I hope it's clear though that there's solid room for improvement here. I welcome all feedback and solutions, even if it's not in line with the suggestions of this RFC.

dleve123 commented 4 years ago

Can someone provide a concrete example of the negative value of stale branches in a repository?

To be totally transparent, I too, dislike, stale branches as it clashes with my drive to be a digital minimalist (with some things), but I struggle to see (and have never experienced) a concrete, velocity-impacting correlation between the presence of stale branches and my ability to get work done.

I think one key proposed "pro" to the forks side of this conversation is that it avoids the clutter of stale branches – just think we should motivate that pro before using it to calculate a decision here.

brainbicycle commented 4 years ago

Don't have too strong of feelings either way but have a preference for branches just for ease of workflow compared to keeping forks up to date. As far as negative impact of stale branches for me it is just clutter and makes it maybe slightly more difficult to find where active development is occurring, pretty minor admittedly and would be hard pressed to find an example where it actually affected velocity.

ds300 commented 4 years ago

Having worked mostly in eigen recently (which uses branches by necessity), branch clutter has cost me a few seconds once or twice while searching for old branches I forgot the name of. Compared to issues with forking it's negligible IMO.

joeyAghion commented 4 years ago

Sounds like a strong branch-naming convention (like <githubusername>-<topic>) could mitigate some of the clutter, or at least make it clear who should clean up.

pvinis commented 4 years ago

I've been doing pavlos/the-thing-i-add for a while when I know a branch might live for a while and I don't want to clutter the branch space. It helps because then / becomes a "dir" in git that I can fold/unfold.

Also, just my opinion too, I prefer branches. Forks create overhead and complications.

damassi commented 4 years ago

Oh nice @pvinis, i like the use-your-name convention a lot.

zephraph commented 4 years ago

Majority support to this point has been for removing forks from our workflow. To that end I'll mark this RFC as resolved and begin the following actions:

PR an update to our peril automation to remind folks to use branches over forks. We can merge this when we feel ready to fully proceed or incrementally roll it out.
Experiment with protected branches in a low traffic repo to ensure it's compatible with our current automation
Slowly rollout the process to our tier 1 projects to ensure there's plenty of time for feedback about the process.

Thanks for participating folks. If you have any more thoughts/comments, feel free to drop them below.

joeyAghion commented 4 years ago

I've noticed a lot more development happening on branches within repos that formerly relied on forks, presumably as a result of this RFC. Two outcomes:

CI capacity has been more scarce, as WIP commits trigger builds more often (minor).
Branches have started accumulating. Some are named in helpful ways (with ticket numbers or developer names or prefixes describing the type of change), but they're still inconsistent and it's unclear which can be cleaned up.

I think we should recommend the following to mitigate this:

Work on forks or branches according to whatever is suggested in a project's README. Absent a specific recommendation, work on branches (for the reasons outlined in this RFC).
When working on branches, name branches like <githubusername>/<whatever-topic>. Even if the actual work is a collaboration among multiple developers, the unambiguous naming makes clear who is creating [and should destroy] the branch.
Delete branches once work is complete.

damassi commented 4 years ago

Delete branches once work is complete

I spend each day deleting branches merged from PRs, i'm not sure why folks have a hard time remembering this step. Do you have any ideas about how to encourage / automate @joeyAghion?

pvinis commented 4 years ago

I'm pretty sure you can do this with git remote prune origin (test it with git remote prune origin --dry-run). Also a source

anandaroop commented 4 years ago

Any reason for us not to enable this config (per repo):

Screen Shot 2020-08-07 at 1 44 05 PM

Or does that conflict with the PR-based deploy approach?

ds300 commented 4 years ago

CI capacity has been more scarce, as WIP commits trigger builds more often (minor).

This can be mitigated in the Circle dashboard by turning off builds on branches that do not have an open PR, which we did in eigen as a necessity when moving our danger script from travis to circle.

joeyAghion commented 4 years ago

@anandaroop exactly: I think that setting would delete the staging branch that our release pipeline depends on. Maybe the automatic deletion rule be enabled in combination with another rule preventing deletion of staging in particular?

damassi commented 4 years ago

Doesn't staging automatically get regenerated during our CI flow?

bhoggard commented 4 years ago

Doesn't staging automatically get regenerated during our CI flow?

No, our CI flow expects it to be there.

joeyAghion commented 4 years ago

I actually think projects using this orb step will successfully recreate a staging branch if it doesn't exist. Not sure what else might break down if it gets frequently deleted, but we could at least try this on a project.

damassi commented 4 years ago

Yeah, I've noticed the staging branch get deleted probably dozens of times with no issues 👍

zephraph commented 4 years ago

I've noticed the staging branch get deleted probably dozens of times with no issues

Yeah, I kinda do that accidentally quite frequently. Disabling builds except for PRs might be helpful so long as that doesn't prevent our process working (would release branches still build?).

artsy / README