gitpod-io / gitpod

The developer platform for on-demand cloud development environments to create software faster and more securely.
https://www.gitpod.io
GNU Affero General Public License v3.0
12.74k stars 1.22k forks source link

Epic(experiment): Better support for multi-repo set ups #7249

Closed loujaybee closed 2 years ago

loujaybee commented 2 years ago

Summary

A discovery epic for investments into documentation, and product improvements related to understanding and documenting use-cases for multi-repository set-ups. The epic is concerned with documenting any existing approaches, rather than implementing product features.

Context

Currently, Gitpod is built around some key product assumptions:

This idea works great for single repo projects (mono-repos with multiple components in it or monoliths), but is not straight-forward with projects that consist of multiple git repositories.

That said, there are currently two main solutions for allowing Gitpod to work well for multi-repo set-ups.

:a: Multiple Workspaces

Depending on how many repositories a project consists of and how the components communicate with each other the preferred solution is to, configure each repository individually and then start workspaces for the needed repositories and bridge the network.

Example: A project consists of two repositories: "Backend" and "Frontend". Each repository has its own .gitpod.yml. The backend repository doesn't even have a dependency on frontend code, so people can code on it without starting a workspace for Frontend if they wish. If you work on both, you start two workspaces one for "Frontend" and one for "Backend" and just code on them through two distinct IDE windows as devs would do locally. The network bridging makes sure your Frontend workspace uses the services exposed by the "Backend" workspace.

Caveat: If you have a large number of small services, starting a large number of workspaces and maintaining the automatic workspace timeout is inconvenient.

Benefits: This approach nicely keeps your repositories separated including their configuration (no big single dev env image but a small dedicated one per repo). Also, it allows starting workspaces on repository contexts for all your components.

:b: Meta Repository

Introducing one main meta-repository that contains the configuration for a single large dev environment that clones all the repos. This kind of mimics a mono-repo and if possible I'd go down the mono-repo route instead of this, because the drawbacks mentioned below:

Caveat: Starting a dev environment from the sub repositories doesn't work, because they don't have a configuration (you can add links to the Readme, though). Contexts don't work, i.e. starting a dev environment on a branch or PR.

Benefits: Kind of is the same as local, where you have one machine that is containing all the dependencies for the various components. More convenient than individual workspaces when you have a highly distributed application with lots of small services sitting in individual repositories.

Hypothesis

If we better document / show users how to convert from their current workflows (specifically more "advanced" use-cases like cross-repo and multi-container) to the Gitpod workflow, users are more likely to adopt, retain, and use Gitpod for doing significant project work.

Value

Acceptance Criteria

Growth Area

Measurement

Persona(s)

This epic is concerned with helping to define user persona's rather than impact existing ones.

In Scope

Out Of Scope

Internal slack conversation [1] Internal slack conversation [2] Internal slack conversation [3]

shaal commented 2 years ago

TLDR: DrupalPod use a repo for the Gitpod environment setup, and another repo for the Drupal code that is being developed.

The DrupalPod project, was created to allow starting Drupal contributions in 1 click.

Setting up Drupal project is a complex task that can take a few hours for first timers.

DrupalPod use the following technologies to achieve its goal:

rfay commented 2 years ago

ddev-gitpod-launcher uses its own repo with environment variables to load another specified repo. That way everything is all set up for the other repo. It's a pretty convoluted technique, https://github.com/drud/ddev-gitpod-launcher/blob/main/.gitpod/start_repo.sh. But amazing that it's do-able.

axonasif commented 2 years ago

Here's an idea:

Users will be able to specify a bunch of sub-repos in their .gitpod.yml (similar pattern to specifying extensions:). The sub-repos should get cloned under /workspace after cloning parent repo. The sub-repos can have their own .gitpod.yml. Sub-repos will be listed in the VSCODE sidebar, from which the user can open another VSCODE instance in a new browser-tab (similar to Remote Explorer mechanism)

============

Here is something I made to achieve a similar thing but nothing like having this feature built-in and well integrated 🤟 https://github.com/axonasif/scripts4gp/tree/main/multi-repo

Thanks!

hobgoblina commented 2 years ago

+1 for including repo declarations in .gitpod.yml and having them cloned into /workspace. That's basically the pattern I'm using for a meta-repository. Just using a separate repos.yml with a simple script to clone the repos before sending a sync-done that allows tasks to run for the cloned repositories - then run code -add /workspace/* during startup. (put up a lil example here)

Ofc would need a more-robust/less-dependent implementation than that example, but it'd be great to just declare repo objects in a repositories section in .gitpod.yml. And maybe as an alternative to cd-ing to the repo's directory for tasks, being able to declare tasks within those repository objects, or add repo/directory params to task objects, or running .gitpod.ymls in the cloned repositories, if they exist... porque no los tres?

ThePaulMcBride commented 2 years ago

This sounds like a great idea. Where I work, our app is powered by a handful of separate repos. A Next.js front end, a Ruby on Rails server, a background worker, and a few other micro services. When added a new feature, or especially when modifying an existing feature, we have to work on more than one of those repos at a time.

Maybe a configuration that works a bit like docker compose where you can specify other repos that your current project depends on?

For example, I could work on the rails app, which depends on a database without needing our frontend app running. But to work on the front end app on anything non-trivial it depends on the rails api, which in turn depends on a database.

Right now, some of our app are dockerised and some are not. I would expect to have to dockerise them to get this to work. Local development at the minute is a case of spinning up each of the parts of the app we need, and allowing them to speak to each other by configuring their addresses/ports as environment variables. IE. we tell the front end that the rails app is at localhost:3000 and our uploader service is at localhost:5000 etc

JohannesLandgraf commented 2 years ago

I think there is a shorter, simpler solution to the problem. I suppose that in the status quo if you have multiple repos for Front - and Backend you also have multiple IDE windows opened on your local setup. Both services can talk to each other because they run on the same machine.

In the Gitpod world you can spin up workspaces based on the individual repos that also essentially become two IDE windows. However, as the workloads are running on the cloud in having the services speak with each other becomes a networking problem.

Instead of using submodules, creating meta repos and other git acrobatic the easy solution would be to leverage Tailscale:

cc @svenefftinge because we spoke about this today

ThePaulMcBride commented 2 years ago

If this is the kind of thing that could be configured as a project and automated, that would be amazing. Locally they are in different repos and are worked on in separate VSCode windows.

loujaybee commented 2 years ago

Added an issue to update the website docs to call out the current solutions more clearly for users: https://github.com/gitpod-io/website/issues/1461

svenefftinge commented 2 years ago

If this is the kind of thing that could be configured as a project and automated, that would be amazing. Locally they are in different repos and are worked on in separate VSCode windows.

That is how it would work with Gitpod as well. I.e. you would start a workspace for each of the repos you need, for a certain change and have dedicated IDE windows for them as you would have locally. If those components interact through networking, you initially set them up using tailscale and they would just work as they do locally.

The benefit over a "meta-repo" (i.e. one parent repo that contains a gitpod.yml to clone all subrepos into one workspace) is that all the supported Gitpod contexts still work. I.e. you can start workspaces from any project, branch, PR, issue. Also if you don't need all the components, i.e. you change something on a backend component without changing frontend code, then you would just start what is needed.

hobgoblina commented 2 years ago

Def into the Tailscale solution, but also think that an integrated alternative to metarepos & git submodules for multi-root workspaces would be great to have. We have a few groups of APIs and interfaces that can be highly coupled within the groups and may be loosely coupled between groups, so it can make sense to work on multiple repos from one IDE instance, at least within the groups.

I may switch to using Tailscale for between-group connections so the metarepo I'm using isn't so monolithic (and hopefully reduce build time?), but wouldn't want to have a workspace for each individual service.

hobgoblina commented 2 years ago

With the pattern in the example repo that I posted, being able to declare the git branch for multiple repos in the repos.yml, and provide a single link or branch name to other devs so they can spin up a workspace with all repos at the right commit is... just nice.

I also think it's important to maintain the "all gitpod contexts can work" thing @svenefftinge mentioned. I currently have .gitpod.ymls in each individual repo, but of course they don't get used by the metarepo. Is why I mentioned in my first comment that it'd be nice to be able to include additional repos in a .gitpod.yml - but then run that added repo's .gitpod.yml after cloning it to /workspace.

ThePaulMcBride commented 2 years ago

Set them up using tailscale

In practice that means I need to understand what tailscale is, how it works and how it should be used for my project. Honestly, it looks like it is probably the right solution for though. It would be ideal if this could be configured through the Gitpod UI or through the config file in a way that isn't a bash script that someone in the team needs to take responsibility for understanding and make sure it works.

I would love to be able to say "This workspace needs to speak to this workspace" and have gitpod make that happen for me.

svenefftinge commented 2 years ago

It would be ideal if this could be configured through the Gitpod UI or through the config file in a way that isn't a bash script that someone in the team needs to take responsibility for understanding and make sure it works.

Yes, we are working on getting to that ideal. In the meantime, it is super helpful to have folks like you trying this in order to make sure this is a good direction. 🙏

svenefftinge commented 2 years ago

@loujaybee I've updated the epic description a bit. Made it clearer we are talking about multi-repo setups and added descriptions for the two common approaches.

shaal commented 2 years ago

Is there an existing/working example in Gitpod of 2 separate repos that can work together using Tailscale?

svenefftinge commented 2 years ago

I have written down an RFC for how we could better support this. Feedback would be awesome!

hobgoblina commented 2 years ago

Love it - the webhooks for triggering multi-repo builds from sub-repositories is the piece I haven't fleshed out for our setup yet. The tricky bit seems to be accounting for branches. Could end up with situations where changes to a sub-repo branch would trigger builds for multiple parent-repo branches, which would mess with the ability to launch the multi-repo workspace from a sub-repo (maybe what you meant by "keep it simple"? tho could still happen from default branches)

That could potentially be handled by an addition to the browser plugin (ie, if there's multiple parent-repo branches that include a given sub-repo branch, the Gitpod button could be a dropdown). But I also don't really see a problem with only being able to launch the multi-repo workspace from the parent repo - especially if you're running sub-repo .gitpod.ymls to build them. There may still be use cases for working from a single-repo workspace for some features but multi-repo workspaces for others... I currently do that, and it's nice to have shorter build times in those situations. 😛

stemount commented 2 years ago

Multi-Repo GitPod

To go big brain there's a few things to be answered:

Having the .gitpod.yml in isolation, similar to "multiple docker compose files" playing nicely using Docker bridged networking would be great, so for example, you could check out the frontend, but then if it also needs the API for a very small feature, however you are working on some small frontend bug, are you not best off having a mocked API instead using Nuxt/Next/nginx/nock (pick your poison)

It would need to "mix" all these dependencies together however to make sure they all play nicely with eachother, and that could get quite complicated, especially if you clone a repository but for whatever reason you can't access the API.

The disadvantage of cloning multiple Git repositories would be the size of the GitPod workspace would be getting bigger and bigger, right? The more repositories I clone, with YYYY amount of depth - I then need to run all the build scripts or start their Docker services working in tandem - could quickly spiral out into running several Express APIs that you don't really need, throw in some microservices, let's start some nginx box for static files too.


GitPod Multiple Workspace Networking

I know this ticket has sorta gone off course into the "networking" problem rather than ways to predictably set up a workspace of many "dependencies" so I'll keep these thoughts brief:

CORS-gate will be real.

  1. I'm a GitPod user and I'm able to both have a separated frontend and backend in two GitPod workspaces.
  2. I'm running a Next.js/Nuxt app and an Express API in two GitPod Workspaces. GitPod helpfully listens for new services from both these repos and provides me a simple UI with a Private or Publicly available URL - I can go and find a place to put in my environment variables, code, or config files.
  3. I configure my API URL on my frontend to be the one that GitPod Workspace B, it's listening on HTTPS which satisfies a lot of problems e.g. CORS, mixed HTTP/HTTPS problems, strict referer headers in Chrome.
  4. Life is good.

Now, I am feeling quite spicy as Person A in Scenario 1 and think "hmm, yes, I could try Tailscale Bridge Mode as ".

  1. I know what I've gotta do, similarly to running workspaces in parallel, I can use my new tailnet to connect to service YYYY - I just need to use my tailscale DNS, magic DNS, or IP somehow.
  2. Ah, wait, there's nothing on port 80/443 of course. After pondering what I'm doing wrong, silly me realises I also need the port number (Gitpod kinda tidies up nicely by putting the port number in the hostname), so no insane .env or similar configurations such as http://100.100.2.3:4000 are needed, which would require a "frontend browser" to also be on the tailnet locally to test (if serving).
  3. Cool, new config looks legit, save, restart, whatever.
  4. I look at my frontend and it's as red as a fire engine with CORS, Origin/Referer Strict mode shouting at me, Mixed HTTP/HTTPS Origin problems?
  5. Cry and wonder what happened to the magic?!

(I suspect GitPod will tie up this in the future where you could tailscale + GitPod the "shared" TLS, which would be epic)

svenefftinge commented 2 years ago

I've built a PoC for supporting multi-repository workspaces and recorded a video. Would be super helpful to learn if such a solution would help and how much, i.e. what kind of issues you could imagine, etc. 🙏

hobgoblina commented 2 years ago

I've built a PoC

Awesome! To confirm, opening a workspace from an issue would create/checkout a branch named after the issue on all repos? If so, are the branches published to origin on issue creation?

Also, are all of the sub-repo's .gitpod.ymls being run or are they all built from the parent's?

Can parent repos be children of other repo's? (e.g., A ⊂ B ⊂ C, where opening a workspace from C would add B to the workspace, which in turn adds A.) Would be nice to have it support multiple layers, and would think it possible as long as there aren't circular dependencies.

joepurdy commented 2 years ago

The proposed solution shown in that Loom would largely solve what we're doing with our application. We have the case where the backend services live in a separate repository from the front-end Javascript app and a third repository exists for a specific public API of ours. This three repository world has made replacing local development with Gitpod a bit tricky.

We'd definitely be able to implement the proposed parent-repository and sub-repositories config to streamline things a bit. Right now we have a "meta" repo called dev-environment that has various scripts and a .gitpod.yml file that executes those scripts to clone in the other repos and set everything up. Obviously this is less than ideal since we lose the context-aware benefits Gitpod offers.

With this solution we'd likely be able to make our current meta repo the parent-repository and each of the other source repos sub-repostories. That way it's an easy migration and brings in context for our main repositories without having to decide which one becomes the parent that stores the gitpod config.

In time we'd likely migrate away from the meta-repo altogether once we decide where to put the shared gitpod config info.

csweichel commented 2 years ago

cc @konne - this might be interesting for you

svenefftinge commented 2 years ago

Closing this discovery epic. Let's continue on the concrete solution https://github.com/gitpod-io/gitpod/issues/7608.