devcontainers / spec

Development Containers: Use a container as a full-featured development environment.
https://containers.dev
Creative Commons Attribution 4.0 International
3.57k stars 228 forks source link

Orchestrator interop #10

Open Chuxel opened 2 years ago

Chuxel commented 2 years ago

Unlike some developer centric formats, a non-goal for devcontainer.json is to become yet another multi-container orchestrator format. Instead, its goal is to enable development in containers regardless of how they are orchestrated. There's a single container / non-orchestrated build property, but the dockerComposeFile property is representative of this desire today (along with support for the "attach" scenario in Remote - Containers, though that's not directly related to the dev container spec). Native Kubernetes integration is one long requested example another format (https://github.com/microsoft/vscode-remote-release/issues/12), but clouds have their own formats, we can expect more to evolve over time, and there's no doubt we could improve interop with Docker Compose.

With that in mind, there are two related proposals to consider:

  1. Introducing an orchestrator property much like customizations (for #1) that we keep in mind for the reference implementation (#9) where orchestrator specific properties can live.

    • We also need to consider whether we'd support connecting to multiple services from the same devcontainer.json given #8. Today you connect to multiple orchestrated containers using separate devcontainer.json files as described in VS Code docs, but the idea of a feature needs to be able to be applied to multiple orchestrated containers when their images are built and they are spun up. This naturally leads to whether a better (or additional) model is to...
  2. Introduce an extension to the spec that would describe how to "embed" devcontainer.json in an orchestrator format. For example, an automated json <=> yaml conversion that enables it to exist in Docker Compose x-* extension attributes (https://github.com/docker/compose/issues/7200).

While each new orchestrator would still necessitate updates to the reference implementation (#9) and/or the orchestrator's code, documenting how this could be achieved would help guide the implementation, and keep everyone in a place that will avoid unexpected issues with changes down the road. For example, if the Docker Compose spec ended up having first class support for certain existing devcontainer.json properties, there would still be a known path for those that were not.

Chuxel commented 2 years ago

Another possibility here would be to allow labels to set devcontainer.json properties as a general mechanism, and then specific embedded models for scenarios that warrant it or support it. This would also allow encoding of this information in images to improve distribution of pre-build images. When the dev container CLI is used to build the image, these labels would be added automatically, but we can support them as straight labels as well (whether in a Dockerfile or an orchestrator format).

The json-based nature of devcontainer.json would main this fairly straight forward. Common and less complex properties could be referenced directly. A modified array syntax could be supported that does not require quoting (if there's no comma in a value) to make the use of some of them less complicated.

LABEL com.microsoft.devcontainer.userEnvProbe="loginInteractiveShell"
LABEL com.microsoft.devcontainer.vscode.extensions="[ms-python.python,ms-toolsai.jupyter]"

More complex any type properties could then be encoded json. You see this commonly in a number of places like images generated via the pack CLI for Buildpacks.

LABEL com.microsoft.devcontainer.vscode.settings="{\"some.setting\": \"some-value\"}"

In the common case, these labels would be automatically added by the dev container CLI to the image when it is pre-built. (devcontainer build ...), but manual entry would enable these additions to be embedded in orchestrator formats as well.

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
      labels:
        - "com.microsoft.devcontainer.userEnvProbe=loginInteractiveShell"
        - "com.microsoft.devcontainer.vscode.extensions=[ms-python.python,ms-toolsai.jupyter]"

Any tool that supports the dev container spec would then look for these image labels - regardless of whether they are on a pre-built image or one built by an orchestrator. The reference implementation would then illustrate how to make this happen.

We should be able make this work with any general, lifecycle, or tool specific property.

For dev container features, however, I'd also propose another property that we add to images only that indicates whether the build step is already done for the image.

LABEL com.microsoft.devcontainer.features.built="[docker-in-docker,github-cli]"

The features metadata label can then continue to be present to provide visibility to what was already applied.

Furthermore, I think we should render out the devcontainer.json properties that tie to the feature in the resulting image. e.g., extensions would include those that were indicated by the feature. Properties like capAdd we've talked about as higher level properties, so these could be handled the same way.

Processing then is always: build (optional), read labels with metadata, run. You can use devcontainer.json as you use it today, but the labels could also come from somewhere else.

Net-net, the resulting image should have labels on it that explains how it should be set up. Orchestrator formats can influence processing where they support adding properties. In all cases, the dev container CLI will inspect the image and make the needed adjustments.

Finally - this should help with the one-to-many problem. For pre-building, each image can have a separate devcontainer.json. When you're not pre-building an image, you can consolidate in an orchestrator format instead.

Thoughts @chmarti @jkeech @joshspicer @edgonmsft @bamurtaugh ?

jkeech commented 2 years ago

This would also allow encoding of this information in images to improve distribution of pre-build images. When the dev container CLI is used to build the image, these labels would be added automatically, but we can support them as straight labels as well (whether in a Dockerfile or an orchestrator format).

This makes sense to me. I was actually thinking through this same scenario a bit yesterday and was leaning towards the same solution. We will need some mechanism to embed feature devcontainer.json contributions in container images (ideally through metadata such as labels) so that you can prebuild an image that was using features during it's construction and have the rest of the feature contributions kick in at runtime without the end user having to specify those transitive features in their repo's devcontainer.json.

As an example, suppose we replace the codespaces-linux "kitchensink" image definition to be the ubuntu base + a collection of many features (different versions of python, node, go, etc). When we prebuild the kitchensink image, we want users to be able to directly reference that image tag in their devcontainer and get everything included which those built-in features provide. The Dockerfile contributions would obviously be in the image layers already, but those features might provide non-Dockerfile contributions, such as VS Code extensions/settings, lifecycle script hooks, runArgs, etc. The devcontainer CLI will need to discover the feature metadata from the prebuilt image and apply everything at runtime. The end user does not need to be aware of which features were used in the construction of the image they are referencing.

Chuxel commented 2 years ago

@jkeech Yep exactly! I agree that there's value in doing this even for the single container case to get to a point you can just ref an image there.

bamurtaugh commented 2 years ago

Is the primary goal of the labels to allow different orchestrators to properly build and connect to containers? i.e.

enable development in containers regardless of how they are orchestrated

a feature needs to be able to be applied to multiple orchestrated containers when their images are built and they are spun up

Based on:

We will need some mechanism to embed feature devcontainer.json contributions in container images (ideally through metadata such as labels) so that you can prebuild an image that was using features during it's construction and have the rest of the feature contributions kick in at runtime

Is a secondary goal of labels to aid in general prebuilding? Or is it more specifically to allow prebuilding across any orchestrator?

I'm trying to understand the main goal(s) of the labels proposal, and if it's essentially an "option 3" (options 1 and 2 to achieve the orchestrator interop goal are listed in original issue), or if it's a pivot/tangent/sub-component of the main orchestrator goal.

Chuxel commented 2 years ago

Is the primary goal of the labels to allow different orchestrators to properly build and connect to containers? i.e. ... Is a secondary goal of labels to aid in general prebuilding? Or is it more specifically to allow prebuilding across any orchestrator?

@bamurtaugh Yeah the genesis for the proposal was thinking through how we could better:

  1. Simplify integration into various orchestrator formats
  2. Reduce the number of files to manage when working with orchestrator formats (e.g., compose being an example of one that has an existing extensibility model we could use, but not everything will have that).
  3. Support injecting features (#8) into multiple orchestrated containers. Right now, features are only supported for the primary container. This decouples that process from the actual orchestration itself so that is no longer a direct concern.

However, in considering this, there are other problems we could solve:

  1. Simplified full-config sharing. Everything can be in the pre-built image. This simplifies reuse across projects and tools.
  2. Reduce the risk of missing a config step that an image requires to run because of a missing field in devcontainer.json.
  3. For features (#8), remove the need to reference the feature again in a devcontainer.json if the feature is pre-built into an image - which helps startup perf and again helps with the missing configuration problem. (I keep hitting this particular one myself.)
  4. Largely eliminate the need for devcontainer.json at all for many single container cases - which is a variation of the benefit of having fewer files to manage.
  5. It also allows any potential container-centric pre-processor tools, frameworks, or services to inject these things themselves using just an image rather than having to be aware of devcontainer.json.

To some extent, we could move this to a more general proposal given the breadth of benefits as I think about it.

bamurtaugh commented 2 years ago

That makes a lot of sense, thanks for the great detail @chuxel!

To some extent, we could move this to a more general proposal given the breadth of benefits as I think about it.

That'd make sense to me 👍. If others also think the labels approach makes sense / is worth exploring further, it feels like it could encompass this topic + the variety of others you've mentioned, and this issue could pivot to focus on it, or we could open another one.

Chuxel commented 2 years ago

To make this all a bit more concrete, I took the "save June" sample from the codespaces-contrib org and created a few branches that step into this:

  1. Current state - You can use a multi-stage Dockerfile along with two devcontainer.json files to do multi-container attach now. Clone, then open the "web" and "worker" folders in a container using Remote - Containers to see what this looks like.
  2. Current state with features - This simplifies the Dockerfiles and still works like current-state, but the critical thing is you need to set runServices to be only the service referenced in devcontainer.json. This is needed so that each container is built separately with the features in devcontainer.json.
  3. Single devcontainer.json - This brings both devcontainer.json files in a single spot, which can also help with the runServices issue. I added a script called fake-it.sh that will run on mac and Linux that mocks up this existing. You need to install the dev container CLI via the VS Code command (not the npm package) for it to work. If then run the script, it will spin up two VS Code windows.
  4. Embedded - A much cleaner implementation that adds x-devcontainer properties to docker-compose.devcontainer.yml that mirrors the devcontainer.json metadata structure, but can omit things like service. Here again, there's a fake-it.sh script that can be used to try it out if you have the devcontainer CLI from the VS Codde extension installed. There's no devcontainer.json file, but the metadata from the spec is still available.
  5. Embedded with labels - This thins out the contents of the docker-compose.devcontainer.yml file by moving a few properties to the Dockerfile for each service. It illustrates how even a hybrid model that mixes embedded with labels can simplify things further. As described there, part of the idea here is that, if you used the dev container CLI to pre-build an image with a devcontainer.json file, all of these properties would automatically be in the image, which would even further reduce what needs to be in the docker-compose.devcontainer.yml file. I did not create a fake-it script here yet.

When I compare 3 and 4, you can see the advantages of embedding, and then how 5 has the potential to thin out what would even need to be in the docker-compose.devcontainer.yml file to those things that are truly specific to the orchestrator scenario.

chrmarti commented 2 years ago

Having a variation of the devcontainer.json that can configure multiple dev containers in a Docker Compose setup makes sense (3).

Moving all of the dev container configuration to a docker-compose.yml removes our "go to" file for dev containers (4). Not sure this is not also disadvantage.

On using image labels (5): This seems to make sense in a broader scope. Should this be in a separate issue? (One thing of note: This probably works best with configuration that is easy to merge, i.e., when the devcontainer.json also touches on the same part of the configuration.)

Chuxel commented 2 years ago

Moving all of the dev container configuration to a docker-compose.yml removes our "go to" file for dev containers (4). Not sure this is not also disadvantage.

@chrmarti Yeah, this would be one example and not to the exclusion of devcontainer.json per-se. Ideally this is a part of the orchestrator integration code as we get to the point where this is a bit more abstracted. The point here being that we can converge with any format with first class support in places that are natural for those using said format. It wasn't a lot of effort to support (as you can see in the fake it code -- it's a straight conversion to json). The devcontainer.json file still has value both as a way to pre-build images and for the single container scenario it already handles.

I always think about this in terms of where devs would be coming from in a given scenario. If you're already using an orchestrator for multi-container setups, you'll be more inclined to add a few values to what you have rather than learning an entirely new format. If you're coming in cold, you also need to learn two things right now rather than focusing on the orchestrator with a few additions.

On using image labels (5): This seems to make sense in a broader scope. Should this be in a separate issue? (One thing of note: This probably works best with configuration that is easy to merge, i.e., when the devcontainer.json also touches on the same part of the configuration.)

Happy to fork it off. It definitely has broader use than multi-container scenarios.

bamurtaugh commented 2 years ago

The devcontainer.json file still has value both as a way to pre-build images and for the single container scenario it already handles.

My impression from 4) was that it had disadvantages when prebuilding:

As before, if you pre-build the devcontainer image and store it in an image registry for performance, its devcontainer.json configuration is still completely disconnected. This makes it very easy to forget something and effectively adds a fourth thing to track in addition to the .devcontainer.json files and docker-compose.devcontainer.yml file.

Does the above mean that a user may add a devcontainer.json to this style of repo to aid in pre-building, but it'd be disconnected from the rest of the config? What are the "the .devcontainer.json files" that need to be tracked (as I only see Compose files), and how do they differ from the other devcontainer.json that'd need to be added?

Chuxel commented 2 years ago

My impression from 4) was that it had disadvantages when prebuilding ... Does the above mean that a user may add a devcontainer.json to this style of repo to aid in pre-building, but it'd be disconnected from the rest of the config? What are the "the .devcontainer.json files" that need to be tracked (as I only see Compose files), and how do they differ from the other devcontainer.json that'd need to be added?

Since you can pre-build using the dev container CLI, and you can pre-build using docker compose already, there's not really a disadvantage for pre-building per-se. The same problems that exist here exist when using devcontainer.json.

We can, however, make things better with what is described in 5) since you can pre-build the image separately, and just reference it either in devcontainer.json or the docker compose (or other orchestrator file). Put another way, you can better decouple pre-building images from using them. Pre-building the image can be in a completely separate repository - even a common one maintained by an ops team. People using the images do not need to be aware of the devcontainer.json content used to create them.

At that point, you've got fewer unique properties getting added to devcontainer.json / docker-compose / another orchestrator format that need to be added when you are just referencing the image directly. This also helps with sharing config since it's all in the image. Multiple repositories can reference the same image directly with little to no config being present.

So, in summary, currently you have to have a pre-built image and devcontainer.json file go together - and they can version completely independently of one another. If instead these properties are part of the image's label metadata, we can determine what to do purely using the image.

Chuxel commented 2 years ago

I broke the label part of this proposal out into #18.

Chuxel commented 2 years ago

Just to update this issue, label support (#18) is now in. Keeping this particular proposal open to cover broader integrations and overall improved support for multi-container scenarios. https://github.com/devcontainers/spec/issues/10#issuecomment-1067002392 includes some example options.

bhack commented 1 year ago

Native Kubernetes integration is one long requested example another format (https://github.com/microsoft/vscode-remote-release/issues/12), but clouds have their own formats

I don't know what is the state of the art now but I have not found a clear paradigm about code versioning and developer environment in a remote setup.

Vscode Kubernetes extensions let to use VsCode remote attaching to an existing container/POD but then you need to have git code in the container (built with the image?, mounted as a volume? git-sync initcontainer with a deploy key?) if you want to edit and commit the code from the remote setup (is it a plausible secure paradigm with multiple developers on the same cluster?).

Google Cloud Code VsCode extension rely on Skaffold to build, deploy and sync files (but still only one way https://github.com/GoogleContainerTools/skaffold/issues/2492). It is hard to use inside a devcontainer as to build the image locally you need to have a docker in docker setup. Also as the sync is one way you need to edit locally and use a remote terminal to run things and debug on the pod after the file watch sync. You can start skaffold without being in a devcontainer but then, as you edit file locally, you not on the same (specular) dev-env of the POD.

So today I still see many problems about how and where to handle the versioning and eventually use of devcontainer to rely on Vscode remote especially for a kuberntes orchestration of the container.

What is you point of view?