facebook / dotslash

Simplified executable deployment
https://dotslash-cli.com
Apache License 2.0
557 stars 14 forks source link

[RFC] Custom Providers #7

Open bolinfest opened 7 months ago

bolinfest commented 7 months ago

Proposal for Custom Providers

Today, the docs state:

At the time of this writing, there is no way to add custom providers without forking DotSlash.

But we have already seen interest in custom providers, so it seems like we should start discussing possible solutions. Note this will likely require some sort of configuration file, which, as a reminder, we would like to avoid having to read in the case of a cache hit.

While the design for the configuration file is still under discussion, let's assume for the moment that at least two locations for provider-specific data are supported:

Provider is an executable

Today, the things a provider needs to know are:

One option would be to pass everything thing needs to the executable via a single JSON argument and then stream the stdout from the provider invocation directly to the path where the artifact should be written. This way, the provider does not get any direct knowledge about the layout of the $DOTSLASH_CACHE.

Because the provider can be an executable, it makes sense for the provider to be a DotSlash file. For example, we could have:

$XDG_STATE_HOME/dotslash/provider/<provider-name>

where <provider-name> is the name of the DotSlash file, which must also match the "type" used in the "providers" section of a DotSlash file. (The "name" in the DotSlash file should probably also be required to match.) Note that this file will always be executed by DotSlash itself, so there is no need for any special Windows stuff.

How to install a provider

A simple option is to support a subcommand like dotslash -- install-provider URL_TO_PROVIDER that would fetch the specified URL, verify it contains a DotSlash file, and then write it to $XDG_STATE_HOME/dotslash/provider/<provider-name>, as appropriate.

Another option (we'll call this the "DotSlash Inception" option) would be to enable a DotSlash file to include metadata about how to obtain a provider referenced in the file. Example:

{
  "name": "example-cli",
  "providers": {
    "my-custom-cas": {
      "size": 40660307,
      "hash": "blake3",
      "digest": "6e2ca33951e586e7670016dd9e503d028454bf9249d5ff556347c3d98c347c34",
      // Must be a single DotSlash file?
      "format": "gz",
      // No need to specify path because it must be my-custom-cas?
      "providers": [
        {
          "url": "https://example.com/my-custom-cas"
        }
      ]
    }
  }
  "platforms": {
    "linux-x86_64": {
      /* size, hash, digest, format, path */
      "providers": [
        {
          "type": "my-custom-cas",
          "id": "72b81fc3a30b7bedc1a09a3fafc4478a1b02e5ebf0ad04ea15d23b3e9dc89212"
        }
      ]
    }
  }
}

The idea is that when example-cli is run for the first time, DotSlash sees that it should use the my-custom-cas provider. If the user does not have it installed, DotSlash can use the information in the providers section to install the provider first and then use it to fetch example-cli.

There are a lot of questions on how strict we might be on the requirements for a provider. There are also questions around how to know when to install a new version of a provider, or what to do if multiple DotSlash files try to provide different implementations of a provider (particularly with respect to defending against attackers).

alilleybrinker commented 7 months ago

Thanks for drafting this initial pass! One thought that occurs to me is that, if the single parameter is JSON, it should be standard JSON, and not the loose JSON accepted by the dotslash tool. This is just because it's not super standardized and cross-language may cause parsing complexities / problems for people writing providers (which may or may not be written in Rust).

Another thought is that the "streaming output to stdout so the provider doesn't need / get to know layout" is a nice idea! I'm imagining the case of the draft GitLab provider, I think this would mean needing to redirect the output of the subcommand to stdout, which isn't too terrible. The one constraint is we'd probably want some protocol for how to handle things going wrong. If we assume providers may basically delegate to a subcommand (like the GitHub and GitLab providers), those subcommands might write error messages to stdout if they're poorly-written. It may be good to specify that providers should write errors to stderr so they can be printed to the dotslash user, and also to use error return codes properly in the provider binary.

bolinfest commented 7 months ago

if the single parameter is JSON, it should be standard JSON

Yes, we would absolutely do that. Though FYI, I just updated the docs last night to list "experimental commands":

https://dotslash-cli.com/docs/flags/

FYI, today you can do:

dotslash -- parse DOTSLASH_FILE

to get the "pure JSON" of a DotSlash file, if that's helpful for any sort of tooling you build around DotSlash.