cnabio / cnab-spec

Cloud Native Application Bundle Specification
https://cnab.io
Other
957 stars 99 forks source link

Air-gap scenarios (IoT/Security/Edge/Zero Network/etc.) #147

Open technosophos opened 5 years ago

technosophos commented 5 years ago

At the March 20 dev meeting, I committed to getting these documented for reference.

The following are the list of scenarios I have been presented with for the case I'm generally calling "air gap". Note that I don't know the details in most of these cases, but they come from customers or existing documented past cases.

There are several variations of the SCIF case that are lower security (the "R&D Lab" case is the main one -- no requirement for write-once media, but otherwise the same)

There are numerous exotic variations of the IoT cases that I am not sure that we want to solve. But one that I think would be noble to try to solve is the case where the invocation image itself cannot be packaged as either an OCI image or a container image because the hub device cannot run either. I was looking at MSIX as a possibility.

The "oil rig" and "submarine" cases are both examples of "edge computing" (for those of you wondering why I didn't have one called "edge computing". The other case is the "windmill case", which is basically a highly constrained bandwidth environment (3G or EDGE in some cases), but with a fairly high-powered device. But that case seems relatively straightforward to solve with thick OR thin bundles.

technosophos commented 5 years ago

Runtimes in these scenarios:

GreenCee commented 5 years ago

Cruise ship or Airplane would be a more common scenario for the submarine that is more universal In each of those cases each time the cruise ships enters a port they will upload signed thick bundles to update their movie selection, or the software for an entertainment system. Preconfigured bundles are mailed to the appropriate ports. The use case would be entertainment packages that are specific to the weather they anticipate seeing. Maybe it's going to rain so they need additional in-suite entertainment options. The media updates they receive is a good analog and a very well understood problem, but the challenge is for every offerer they have to have their physical server. The isn't in port long enough to receive OTA updates, and OTA bandwidth is limited. Additionally they want to be able to upgrade dynamically without

SCIF use case can be expanded to any compliance boundary. The defining feature is that software must be delivered and deployed without internet access in/out of the enclave. Nodes inside this network boundary (Either strictly controlled ingress/egress or physical airgap) are purposely unable to reach out to internet sources for updates opting to use monitored internal mirrors. This allows for greater separation in security layers.

Last three are a little soapbox-oriented; and might not be proper use cases

Product certification and delivery, the ability to deliver the actual VMs/images for an application. That would allow a company to deliver a certified known working configuration reducing the need for installation services as well as support costs for applications that don't fit a typical SaaS pattern.

A thick bundle would provide a level of assurance for deployment time. Instead of not knowing the amount of time to download updates, the timing of installs would be only on the LAN. For those in a franchise thick bundle can be trickle downloaded and the maintenance windows would be more predictable and less risky. JIT administrator access is now becoming a common pattern, but install windows are rarely based on fact and just opened to the max window just in case. Tighter JIT window = Better security posture.

simonferquel commented 5 years ago

I think, the interesting question is: in all these cases, do we expect that the airgapped environment has the following requirements:

If both are true, I think we can ditch the Thick Bundle case from the spec, and rely on external tool capable of relocation. If we take the example of the SCIF case (which seems the most complex) with cnab-to-oci, what we can do is just to add the capability of "dumping" an OCI Index / Docker manifest list with all dependent blobs in a tarball (without having to interpret it as a CNAB) and restoring it in a new registry. The "cd-rom" (or most probably DVD-Rom or Blu-ray) will contain:

For VMs/executables, I think having support for image relocation (in the way @glyn is currently prototyping in Duffle) should be sufficient:

WDYT ?

glyn commented 5 years ago

If both are true, I think we can ditch the Thick Bundle case from the spec, and rely on external tool capable of relocation.

The downside of an external tool is that it would be non-standard. I'd like to be able to choose a single runtime and know that it can relocate any thick bundle. Conversely, I'd like to be able to ship a thick bundle knowing that arbitrary runtimes will be able to relocate it.

GreenCee commented 5 years ago

+1 on raising this core question. I'll share some experiences to frame, I've built a couple snowflakes with airgap and similar requirements. Definitely want to hear if chaining different tools is the right way to approach the outcome rather than rolling into CNAB spec itself. I'll echo an earlier comment that I'm expressing the UX I want, but need to be tempered with the reality and implications.

I think, the interesting question is: in all these cases, do we expect that the airgapped environment has the following requirements: for docker/oci images: is there a container registry within the arigapped environment ? for raw executables and VM: is there a file share of some sort within the airgapped environment ? If both are true, I think we can ditch the Thick Bundle case from the spec, and rely on external tool capable of relocation

In my experience I would say we cannot expect these, nor should we as it would break self-contained aspect.

A 'relocate' function I think it worth calling out as an optional action that will "distribute files/OCI images and mutate the Bundle.json appropriately". So thin or thick, it would be 'bundle relocate | bundle install'. Every install I've touched in Gov/Finance has required pulling images to a private registry and only running images from that registry even when internet connected so there's value in thin bundles having a concept of relocate, though implementation may be less work to just do a closed export->relocate->install action

If Kubernetes, or anything using OCI a registry would definitely be needed, but could be deployed (Quay VM, ARM->ACR) as part of the bundle. Similarly, if the install requires a fileshare (VM-to-VM) to complete I would want that called out as a required param to allow for planning installs or eventually dependencies on other bundles that'll be brought in.

Current shortcoming of todays tooling is they rely on a fileshare to be stood up, that becomes a dumping ground and is a nightmare as it allows poor practices like mounting and running executables directly.

I do think 'relocate' needs to be called out as an optional action with the assumptions of the behavior and the requirements levied on the runtime.

An alternative we talked about in #95 was exposing the VM bits via mount and putting the burden on the invocation image to know how to relocate.

GreenCee commented 5 years ago

Wanted to pull on this, and discuss further in the community call.

@simonferquel registry legend

Ignore the ORAS label on the right, does this capture what you're describing, where the CNAB artifact type would effectively be an OCI index with the method of handling that would pull all referenced images during an export action.

On the right was the contrast I had which was taking the on-disk representation of the thick bundle in its entirety similar to ORAS. Which would be.... an OCI Index pointing to binary blobs, at which point I think I just described a dumbed down version of the CNAB-to-OCI implementation preventing the registry from treating it intelligently.

So CNAB artifact registry type is created, Index pointing to all the references invocation images and images that would make up the thick bundle, with the handling defined as being able to support a full load/unload operation.

I'm actually liking that approach. Assuming there isn't a registry on the target system(No K8s or other stuff that needs it), so the 'cnab load' runs, all the artifacts get jammed into the local image store(sp?). Is there a way to represent those on the files at the same time, maybe just file links? It would lean heavier into this https://github.com/deislabs/cnab-spec/commit/783c3524ae1ae88496de6a7d5e68282cf0d8bb87

End result would be to streamline the CNAB meta-lifecycle to mimic that of docker images with load/import/export, either directly through CNAB-to-OCI (as a separate tool or just library, or just defining that behavior in the spec.

Would like to tease out the use-cases that break in this case. I do think that I intermix driver vs. runtime though

GreenCee commented 5 years ago

Here's a real VM-centric example of what thick bundle might look like mechanically. Azure-specific, but this draws out the transport process, which is a subset of CNAB. Feedback from customers I've pointed them at CNAB and docker-app. They want to transport artifacts and logic(invocationImage) all together between azure stacks, onsite, cloud etc. In short portability. Today it's ARM+zipfile or DockerCompose+script to export images referenced.

https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-download-azure-marketplace-item#disconnected-or-a-partially-connected-scenario PS code that walks the dependencies and downloads them. https://github.com/Azure/AzureStack-Tools/blob/master/Syndication/readme.md

simonferquel commented 5 years ago

@GreenCee I am not sure I understand fully what you describe here, but I think we are aligned. My major opinion here, is that we should not require anything from the invocation image to support air-gapped scenarios. That is, as soon as the invocation image is aware of the injected image-map, it should work exactly the same as in non-airgapped.

For a container based CNAB I see it work with the following actions:

  1. create a "thick" archive with cnab-to-oci, or any tool vendoring it (note that I don't call it thick bundle. I see it as a deep export of the OCI index backing the CNAB, more than something standardized in the CNAB spec). Note that cnab-to-oci does not yet support this, but that is quite an easy addition to the current state of the tool.
  2. [optional] deploy a registry to the target air-gapped cluster: there is a bit of chicken and a egg problem here: e.g. if I want to install a registry as a kubernetes workload, how do I provision the registry image ? Need help about strategies to cover this step. Maybe a valid scenario is to just provide the registry as a binary that you'd run outside of the cluster ?
  3. hydrate the registry: just load the deep OCI index export created on first step (could be handled either as a separate step, or implicitly by a tool vendoring cnab-to-oci
  4. run the invocation image with the correct image map (just like non-airgapped image relocation)

I am convinced we can have a cnab runtime implementation that does all that without having a specification for thick bundles.

For step 2., my Docker bias makes me see air-gapped clusters are things with a similar layout as Docker EE (a cluster with some kind of registry - DTR in the Docker world - already part of it), but I am very interested in understanding the need for us to support creating a registry backing the CNAB inovcation and component images, reachable from the target cluster. Anyone with real world examples of this ?

The case of VM-based deployments seems a bit more tricky though.

Now for some comments:

I'm actually liking that approach. Assuming there isn't a registry on the target system(No K8s or other stuff that needs it), so the 'cnab load' runs, all the artifacts get jammed into the local image store(sp?). Is there a way to represent those on the files at the same time, maybe just file links? It would lean heavier into this 783c352

The problem with this approach, is that when hydrating a docker daemon's with a docker image load, we loose registry-digests information, so we would need a different way to reference images (by tag maybe ? how do we guarantee immutability then ?)

An alternative we talked about in #95 was exposing the VM bits via mount and putting the burden on the invocation image to know how to relocate.

Would'nt that break one of my opinion about how thick-bundle should work (like exactly the same as in non-airgapped) ? Not sure about it, I lack some context about how VM-centric bundles are built/used.

GreenCee commented 5 years ago

Will work on a response on the rest, but wanted to concur on this:

This 110%. Relocation and rewrites seem to be the tooling we're surfacing to fulfil this goal, as I'm sure other parts will come up, and I'm all for minimalism on which ones make it into the core CNAB-Spec vs. which ones are common patterns

I am not sure I understand fully what you describe here, but I think we are aligned. My major opinion here, is that we should not require anything from the invocation image to support air-gapped scenarios. That is, as soon as the invocation image is aware of the injected image-map, it should work exactly the same as in non-airgapped.

GreenCee commented 5 years ago

This tool came up at RH Summit: https://github.com/RedHatOfficial/odie/blob/master/README.adoc

In this case ODIE aims to provide a 100% batteries included environment to the point of being a bootable DVD.

For CNAB, I understand the goal as being less of a 'bootstrap from nothing' and more of using parameters, rewrites etc for a streamlined experience.

So a CNAB could contain a VM to provide things like container registry, yum server and such, but the runtime is not required to host a registry, NFS server, etc.

technosophos commented 5 years ago

ODIE looks very interesting as a way of traversing that particular security boundary.

jlegrone commented 5 years ago

I think it would be great for the spec to define a path on the filesystem for json-schema to be located in thick bundles. Right now I believe implementations would need network access to fetch schemas.

jeremyrickard commented 5 years ago

@jlegrone We should open a follow-up to track this one. I'll drop one in.

technosophos commented 5 years ago

Removing this from the 1.0 milestone. We believe it has been addressed for CNAB Core 1.0, and now it needs to be attached to registry spec for related work there