devcontainers / spec

Development Containers: Use a container as a full-featured development environment.
https://containers.dev
Creative Commons Attribution 4.0 International
3.34k stars 203 forks source link

Distributing features & templates #7

Closed chrmarti closed 2 years ago

chrmarti commented 2 years ago

The goal is to distribute definitions to distribute the maintenance load. We currently have 3 buckets:

For the community definitions there are various levels of self-service, when useful we can use the JS&TS definitions to help dogfood the approach we decide to take. The approaches I can think of are:

  1. Community submits PRs (like today). In a separate repository (one for all community definitions).
    • Initially the product teams would review the PRs. In the future a team of volunteers from the community might do that (e.g., the DefinitelyTyped project does that).
  2. Community pushes updates themselves. In a separate repository (one for all community definitions).
    • New contributors might first contribute through PRs.
  3. Community contributors can have their own repository for one or several definitions. a) We maintain a static registry to collect the definitions at build time, later at runtime. b) We have dynamic registry, definitions are discovered at runtime. New contributors can register definitions themselves.

A dynamic registry where new contributors can register definitions themselves maximizes self-serviceability, but it also raises security concerns. VS Code extensions are using such a model and it is an open issue to add support for cryptographically signing extensions, so authorship can be verified (https://github.com/microsoft/vscode-vsce/issues/191). There is work being done on this that very much looks like it could be used for other types of artifacts too, but it is too early to tell if we could use it for definitions.

I suggest we take the following steps (progressing from 1. towards 3. above):

Open questions:

/cc @2percentsilk @bamurtaugh

Chuxel commented 2 years ago

@chrmarti Another open question related to the script topic is dev container features - particularly 1st party ones. I'm kind of assuming they'd live in the same location as the scripts, but technically that would not have to be the case.

joshspicer commented 2 years ago

I would elect for approach (3) above, allowing users to create their own repos which they maintain themselves on GitHub.

I think it would be really powerful for the extension(s) to dynamically search GitHub for templates given a search query. Perhaps some simple naming convention or repo topic can be devised to filters repos showing up in the search.

This is how you pick a codespace - I can imagine a similar search mechanism for the extension. image

We'd still show our "default" templates without needing to search, allowing for a user to quickly pick from a list of "good starting points".

bamurtaugh commented 2 years ago

I like the idea of searching. I have some initial questions on it, if we go with users all having their own repos vs one central community repo:

I'm thinking of the VS Code core Extension search experience. Users may initially want to select an extension they weren't intending, but in the Extension view, they can view the publisher and description to understand where exactly this extension is coming from and if it's the one they want. I wonder if we'd want to link an easy way for users to view the repo on GH before selecting it?

Chuxel commented 2 years ago

@bamurtaugh I think we would want to surface who "published" the definition in all cases. Right now there's only two modes: "Microsoft" and "Everyone else". I'm sure this prevents some people from contributing since everything looks like it is unofficial unless Microsoft created it (e.g. see https://github.com/microsoft/vscode-dev-containers/pull/1238). Given our desire to be open to alternate sources, many of these may also be "official" if viewed from the perspective of the technology in the definition rather than the tool. For example, Google or Amazon, or the core language teams for Rust or Swift could opt to add their own official definitions - they're really no less official than the Microsoft maintained ones.

The question then is how you reliably identify the "publisher". Right now for public dev container features, the model in preview uses a public GitHub org - which is a pretty good proxy.

Search, then, is really a separate question. For a tool / service agnostic model, we could start by using GitHub topics, which then allows you to search within the topic. This is how Actions is setup: https://github.com/topics/github-action GitHub then layered in a service specific marketplace UX (https://github.com/marketplace?category=&query=&type=actions&verification=), but I believe the APIs would allow anyone to create their own index if they so choose (including for VS Code or something else).

VS Code and Codespaces UX is really beyond the scope of dev container spec discussions tho, so we could iterate on what we'd like to present there elsewhere. I don't think a single vs multi-repo path really affects it.

chrmarti commented 2 years ago

While I agree with much of what is being discussed, in a first step I would like to keep producing the list of available definitions and ensuring their authorship simple (discoverability and trustworthiness). The proposal is to start with 3 repositories (Codespaces, community and VS Code) and collect the definitions at build time.

As we increase self-serviceability from there, we will need to think more about security. (@Chuxel you mention a GitHub org as proof of origin, if a personal account counts as "org", it wouldn't be strong, so maybe you have corporate accounts in mind?)

On container features: I think we can split these up the same way as definitions for now. For features having a way to update between releases is more important than for definitions I think, so we should also continue productization of that existing effort.

Chuxel commented 2 years ago

As we increase self-serviceability from there, we will need to think more about security. (@Chuxel you mention a GitHub org as proof of origin, if a personal account counts as "org", it wouldn't be strong, so maybe you have corporate accounts in mind?)

@chmarti Totally agree - we need proof of origin. The nice thing about an org, is individuals cannot easily pretend to be an "official" org for a given language/product/team. Users can also not pretend to be another user (since the ID is tied to your github profile). If I publish something, the org will be chuxel - and github.com/chuxel tells you what you need to know. Something official would be expected to be in an org owned by a company, OSS project, team of enthusiast, etc. e.g. aws for Amazon, rust-lang for official rust projects, etc. Certainly there's situations where an individual has something that could look official (e.g. github.com/vscode), but GitHub also has the concept of verified orgs already, so we can use that as well.

chrmarti commented 2 years ago

From a security point of view, it matters that anyone can create a GitHub account. It seems 'verfied' orgs are only available for company accounts, but not personal accounts (from what I could find in the documentation). Verified accounts would help, but we would need this for all contributions of definitions.

Note that I'm not suggesting to not do this, but we need to think about the security implications (and get these reviewed). While this requires more discussion and investigation, my proposal above is to start distributing definitions in a way that avoids the need for more sophisticated security measures.

Chuxel commented 2 years ago

From a security point of view, it matters that anyone can create a GitHub account. It seems 'verfied' orgs are only available for company accounts, but not personal accounts (from what I could find in the documentation). Verified accounts would help, but we would need this for all contributions of definitions.

Ack - yeah, it looks like you can only be "verified" with the enterprise tier... which isn't corporate per-se, but the price point is high enough that teams and OSS may not have it - let alone individuals. Drat! 😭

elaine-jackson commented 2 years ago

An issue to consider is once the list gets large enough it's going to be incredibly tedious to scroll through even with the search functionality. I'd like to propose menus.

For example you might first select Java then a menu of possible configurations. Java 8, Java Standard (e.g. 11 or 17), Java with MariaDB, Java with Redis, etc. This would reduce menu cluttering and differentiate between a language and optional features you may need such as a database or cache.

Chuxel commented 2 years ago

@irlcatgirl Thanks for the UX input! Certainly each entry point that supports these definitions will need to have a scalable user experience. I agree though that we'd need to support the idea of tagging for both filtering and automated detection much as you see in the VS Code extension marketplace and GitHub Actions marketplace today.

elaine-jackson commented 2 years ago

@irlcatgirl Thanks for the UX input! Certainly each entry point that supports these definitions will need to have a scalable user experience. I agree though that we'd need to support the idea of tagging for both filtering and automated detection much as you see in the VS Code extension marketplace and GitHub Actions marketplace today.

Personally I would go have each programming language, and special cases like Azure, Docker, at the root list then when you click a language you'll get options like Java, Java 8, Java with Database, etc. It doesn't make sense to have Language + Database at the top because ultimately we are going to want to have premade configs for the most popular databases meaning we have issues like https://github.com/microsoft/vscode-dev-containers/issues/1288 which will not scale when we do the same for every language.

Chuxel commented 2 years ago

@irlcatgirl Thanks for the input! To be clear, we're not discussing user experience in this issue, but rather how the definitions are stored and contributed along with any needed associated metadata. Given definitions can also cross language or runtime, having tags that you can then anchor off of should achieve the goal I think you are trying to get at, correct? How these are presented then becomes an aspect of the product that is implementing the dev container specification. That said, agree with what you are saying.

Even today, there are tags in these definitions, they're just parsed out of the README.md file. @chrmarti @joshspicer - What are your thoughts on where we should be maintaining this kind of metadata going forward? Should we formalize definition-manifest.json as a place to store this metadata in a more machine readable form?

joshspicer commented 2 years ago

Even today, there are tags in these definitions, they're just parsed out of the README.md file. @chrmarti @joshspicer - What are your thoughts on where we should be maintaining this kind of metadata going forward? Should we formalize definition-manifest.json as a place to store this metadata in a more machine readable form?

@edgonmsft and I were discussing this today actually! Ed has a bit more info a in doc exploring this, but I put together a first pass at a devcontainer-templates.json - very similar to how we have our devcontainer-features.json:

Something like....

// devcontainer-templates.json
// Example for https://github.com/microsoft/vscode-dev-containers/tree/main/containers/ruby
{
  "templates": [
    {
      "name": "MyRuby",
      "description": "A super cool template for you Ruby devs!",
      "publisher": "GitHub",   // Added by GitHub Action during packaging
      "version": "v0.0.1",     // Added by GitHub Action during packaging
          "sourceRepo": "https://github.com/codspace/my-ruby-devcontainer-template",
      "categories": [
        "Core",
        "Languages"
      ],
      "architectures": [
        "x86-64",
        "arm64"
      ],
      "includeExampleCode": true, // If true, don't just copy the .devcontainer folder, but all code (may include example code, etc...)
      "baseOS": "Debian", // ?
      "options": {
        "variant": {
          "type": "string",
          "enum": [
            "3",
            "2.7",
            "3-bullseye",
            "2.7-bullseye"
          ],
          "default": "3",
          "description": "Select variant of the vscode/devcontainers/ruby image to be set as the base image of this Dockerfile."
        },
        "node_version": {
          "type": "string",
          "proposals": [
            "lts",
            "16",
            "14",
            "10",
            "none"
          ],
          "default": "16",
          "description": "Specify version of node, or 'none' to skip node installation."
        }
      }
    },
        {
                // ...more templates...
        }
  ]
}
joshspicer commented 2 years ago

I'm thinking we can extend the features GitHub action to be more generic, as well as our extensions to read/write metadata to this file

Already been toying around with template packaging on my fork here: https://github.com/joshspicer/devcontainer-features-action

edgonmsft commented 2 years ago

Yeah @joshspicer and me have been thinking of creating this devcontainer-templates.json file that would include all the metadata about the template.

The idea would be to split the information in a README file that is more end user facing. and the json file contains the way we show the information in the search results.

The main things to include in the file would be:

The action mentioned by Josh above could create the release and also release a copy of this file with things like contributors, origin repository and org/user and commit.

We might want to define a starting list for those keywords to eventually allow different searches:

Possible search paradigms

Possible Keywords:

As for the structure of the file I'm thinking something similar to above with this values:

Simple JSON format with the data needed for all uses.

elaine-jackson commented 2 years ago

@irlcatgirl Thanks for the input! To be clear, we're not discussing user experience in this issue, but rather how the definitions are stored and contributed along with any needed associated metadata. Given definitions can also cross language or runtime, having tags that you can then anchor off of should achieve the goal I think you are trying to get at. How these are presented then becomes an aspect of the product that is implementing the dev container specification. That said, agree with what you are saying.

Even today, there are tags in these definitions, they're just parsed out of the README.md file. @chrmarti @joshspicer - What are your thoughts on where we should be maintaining this kind of metadata going forward? Should we formalize definition-manifest.json as a place to store this metadata in a more machine readable form?

A tagging system sounds good. We might still need to change the repo's file layout in terms of keeping development scaleable. A list of 1000s of folders to scroll through would be just as hard to maintain.

E.g. /tree/main/containers shouldn't have 14+ folders that start with azure. Rather they should all be grouped into an Azure folder. This is in terms of keeping dev containers organized for the people maintaining the actual containers.

Chuxel commented 2 years ago

@joshspicer @edgonmsft Love the idea of making this and features.json as similar as makes sense.

One note is that we've normalized on the term "definition" for these rather than "template" (since we have those too - e.g. vscode-remote-try-node is more of a full template, and I'm assuming we'll do more of these). So naming wise devcontainer-definitions.json might be better. We've said the definition is a devcontainer.json + all needed assets that "define" the container - this includes runtime settings, etc so its broader than an image. We'll want to be deliberate about changing the name given the number of places it's referred to in that way - both in our content and blog posts, etc.

To some extent, given definitions have options, they are a lighter-weight variation of a yeoman "generator" if I think about another public example. We could also allow for dynamic scripts over time - though that may not be needed.

The other thing we probably want to add to both this and features.json is an arbitrary metadata property (of type any). We could then move the current contents of definition-manifest.json under that section - things like dependencies are very specific to how we manage images, so having those as a full parts of the spec probably doesn't make sense.

This probably also make sense in devcontainer.json for free-form content that is not part of the spec.

Chuxel commented 2 years ago

A tagging system sounds good. We might still need to change the repo's file layout in terms of keeping development scaleable. A list of 1000s of folders to scroll through would be just as hard to maintain.

E.g. /tree/main/containers shouldn't have 14+ folders that start with azure. Rather they should all be grouped into an Azure folder. This is in terms of keeping dev containers organized for the people maintaining the actual containers.

@irlcatgirl Yeah agreed - that said, I think the idea over time is that these can each be in completely separate repositories. So the "grouping" would be the repository itself. So, you could have azure/dev-containers-definitions, github/dev-containers-definitions, aws/dev-containers-definitions, rust-lang/dev-container-definitions, a-github-profile/my-dev-containers-definitions, etc. That also has the added benefit of keeping the source code with the people that are actually maintaining it.

What I think we want to avoid is what has happened DefinitelyTyped's repository. There's over 6000+ folders here: https://github.com/DefinitelyTyped/DefinitelyTyped/tree/master/types

So any "community" repository mentioned above would be a step towards a multi-repository plan. That mesh with your thinking as well @joshspicer @chrmarti @bamurtaugh @2percentsilk ?

elaine-jackson commented 2 years ago

I am concerned if we use separate repositories we open the possibility of a rouge maintainer pushing a malicious update affecting all VS Code. I would prefer any repository is maintained by Microsoft so there is code review before it hits all VS Code users. I limit the extensions I install for the same reason. A Microsoft run repository linking to non Microsoft maintained repos makes me extremely anxious. Linking to third party Git Repos seems like a bad idea.

I propose a Microsoft run community definitions repo so the Microsoft maintained definition is separated.

Chuxel commented 2 years ago

I am concerned if we use separate repositories we open the possibility of a rouge maintainer pushing a malicious update affecting all VS Code. I would prefer any repository is maintained by Microsoft so there is code review before it hits all VS Code users. I limit the extensions I install for the same reason. A Microsoft run repository linking to non Microsoft maintained repos makes me extremely anxious. Linking to third party Git Repos seems like a bad idea.

I propose a Microsoft run community definitions repo so the Microsoft maintained definition is separated.

@irlcatgirl One of the key goals with opening up the spec is to encourage use outside of just VS Code. As with all things, there will be things Microsoft maintains, and then things that it does not. We want a plan to enable both. As mentioned earlier in the thread, a definition AWS publishes for AWS services should be on equal footing to something Microsoft publishes for Azure from a trust perspective. So, visibility to the publisher is the key here like for any other community contributions. Think about this as the evolution of a marketplace, but being deliberate about how we get there to address concerns like you mention here.

elaine-jackson commented 2 years ago

What prevents a definition maintainer from inserting a malicious template string into a title after being approved and then injecting arbitrary code into VS Code affecting any user with the remote containers extension installed? What safeguards will be in place from a security context?

Chuxel commented 2 years ago

What prevents a definition maintainer from inserting a malicious template string into a title after being approved and then injecting arbitrary code into VS Code affecting any user with the remote containers extension installed? What safeguards will be in place from a security context?

@irlcatgirl This gets a bit into tool specific implementation rather than the overall specification - so ideally we'd like to center this issue on the spec topic. That said, each implementing tool needs to do a threat analysis no matter what. Even if a devcontainer.json file was in an application repository, there is a description, and you wouldn't want what you're describing to happen there either. That's going to be required no matter what the model ends up being. Clearly this is true for Remote - Containers and Codespaces or any other service or tool that follows the specification. Many of these safeguards already exist due to the heavy use of json throughout the VS Code product, but we definitely need to keep an eye on it regardless.

Beyond this, the publisher denotes who approved the definition - but, a provision for having your own private set also makes sense for people that do not want to trust public sources.

edgonmsft commented 2 years ago

Taking into account the comments that have been made here I wanted to write a proposal that would help us make more clear comments on what we want.

For that proposal I took into account the following:

The proposal is in this PR: https://github.com/microsoft/dev-container-spec/pull/15

Additionally we are looking for security review on it and the proposal can change according to those comments and the broader comments on it.

Rabadash8820 commented 2 years ago

Forgive me if this is already documented somewhere, but why do we have both dev container definitions and features? Seems to me that a fully fledged feature ecosystem would obviate definitions. I can imagine selecting the base OS/architecture for my dev container, and then for everything else just searching/selecting features. As an example, why should users select the Node.js definition and then check the various features that they want, like Azure CLI, when they could just select their base OS/architecture and then select features for both Node.js and Azure CLI? The feature metadata could include which OSs and architectures that feature supports, at which I point I see no reason for the concept of container definitions at all. Said another way, it seems like a dev container "definition" is just the union of all features within that container. Any additional settings for filesystem permissions, networking, etc. that might be specified in a definition would just be specified in the install script for the feature that needs those settings.

jkeech commented 2 years ago

Forgive me if this is already documented somewhere, but why do we have both dev container definitions and features? Seems to me that a fully fledged feature ecosystem would obviate definitions. I can imagine selecting the base OS/architecture for my dev container, and then for everything else just searching/selecting features. As an example, why should users select the Node.js definition and then check the various features that they want, like Azure CLI, when they could just select their base OS/architecture and then select features for both Node.js and Azure CLI? The feature metadata could include which OSs and architectures that feature supports, at which I point I see no reason for the concept of container definitions at all. Said another way, it seems like a dev container "definition" is just the union of all features within that container. Any additional settings for filesystem permissions, networking, etc. that might be specified in a definition would just be specified in the install script for the feature that needs those settings.

This is how I'm thinking about it 👍. In the end, almost everything should be defined as a feature or collection of features. A definition should just be a devcontainer.json with a base image and set of features for a specific scenario. Of course we aren't there yet with the set of features that exist, but that's the direction that I'd like to move in and focus on building a comprehensive ecosystem of features that people can search, drop into their project, and build on top of, like many other package ecosystems.

Versioning and patching should also be much simpler with features -- just bump the version in your config file. No need to tweak Dockerfiles or anything else. All of the logic related to how to install the new version is self-contained in the feature which should make it much easier for people to both add initially and to maintain in their repo over time.

Rabadash8820 commented 2 years ago

This is how I'm thinking about it 👍. In the end, almost everything should be defined as a feature or collection of features.

Glad I'm thinking along the right lines. 🙂

A definition should just be a devcontainer.json with a base image and set of features for a specific scenario.

Well, now that you say that, I guess I do kind of see the value in having a curated list of devcontainer.json definitions for these "specific scenarios". For example, I'm working on adding a feature for the AWS CLI (microsoft/vscode-dev-containers#1326), but my actual use case is more specific; I want to make a serverless Alexa skill on AWS. So my devcontainer will need the AWS CLI, the CDK CLI, probably the SAM CLI, my chosen programming SDK (Java), and a couple preinstalled libraries (particularly the Alexa Skills Kit and AWS SDK). It would be really nice if there were just a Tools for Alexa skill definition that was ready out of the box.

However, such a list of curated definitions would grow exponentially. Even for this Alexa use case, there'd probably need to be one for each programming language that AWS Lambda supports...and if someone wants their backend hosted in a different cloud, then there'd need to be definitions for each language that Azure supports, GCP, Alibaba, etc. etc. So idk, I guess your "collection of features" idea is most appropriate. I.e., if there's already going to be a "feature marketplace" similar to extensions, then some of the features in that marketplace would just be bundles of other features, e.g. Tools for serverless Java Alexa skill on AWS Lambda.

joshspicer commented 2 years ago

I've been working on providing a repeatable way to encode the information from all our existing definitions to JSON for better portability/editability. Details here: https://github.com/microsoft/vscode-dev-containers/pull/1332

@edgonmsft

Chuxel commented 2 years ago

There's now a PR on this with a more formal proposal coming together as a FYI. https://github.com/devcontainers/spec/pull/40

This PR also covers the distribution of features tied to the dev container features proposal - which is a related topic.

alexdima commented 2 years ago

We agree:

We have identified the following blocking conceptual problem with the latest proposal:

joshspicer commented 2 years ago

we would like that the reference CLI maintains a cache of downloaded features.

Is the idea here that we continuously "re-hydrate" the cache automatically new releases are published? This is the behavior that we should strive for within Codespace due to the slower release cycles.

alexdima commented 2 years ago

Is the idea here that we continuously "re-hydrate" the cache automatically new releases are published? This is the behavior that we should strive for within Codespace due to the slower release cycles.

Yes, the cache would just be a local file cache to avoid downloading a specific .tgz again and again. For example, if someone uses the feature devcontainers/features/node@v1 in their devcontainer.json, that would have to first be resolved to the latest released version of the feature according to semver rules, let's say that would be devcontainers/features/node@v1.2.3. The cache would be used before downloading the .tgz; a local folder could be checked to see if that exact .tgz was downloaded before and then use it straight from the cache. IIRC we argued that we don't consider a builtin/prefilled cache necessary, so the cache would be empty initially.

bamurtaugh commented 2 years ago

Characteristics:

Options overview:

Options (in more detail):

Next steps:

joshspicer commented 2 years ago

Backing Mediums for dev container features

- GitHub Releases Existing Package Manager (npm) Clone Git Repo (Tags/Refs) OCI Registry
Vendor Neutrality
Public/Anonymous Access ⚠️ (APIs are heavily rate limited when anonymous)
Option for Private Access
Free Tier for publishing ⚠️ (public only) ⚠️ (500MB of private in GHCR)
"Natural Fit"
Existing Searchable "Marketplace"
Integrity hashes built-in
Built-in "Report Abuse"
Built-in Code Scanning (depends)
Immutable
Alignment with Actions
joshspicer commented 2 years ago

Criteria for first-time discoverability of feature collections

Author

Generally

joshspicer commented 2 years ago

Converged on using OCI as the primary backing storage for features (and other dev container assets in the future, probably) the

joshspicer commented 2 years ago

A specification has been merged with the 'proposed' status. And changes can be brought up as new issues, and then as a PR to update the specification.