datawire / forge

Define and run multi-container apps in Kubernetes
http://forge.sh
Apache License 2.0
416 stars 43 forks source link

Add support for imagebuilder as an alternative builder for docker images #137

Closed mumoshu closed 6 years ago

mumoshu commented 6 years ago

imagebuilder is an alternative to docker build for building secure(A volume mount inside a build context, no need to leave secrets inside the docker image/metadata in case you have bundle install or npm install or etc. to install packages hosted in private Git repos) and small docker images(w/ squashing) in a faster way(No context upload).

This adds a imagebuilder support to forge, so that it can leverage imagebuilder for secure, fast, smaller image builds.

Spec

A new setting key image_builder is added under containers in service.yaml, which can be omitted or set to either docker or imagebuilder.

If the value was omitted or set to docker, docker build is used for building docker images. This is the existing and default behavior before/after this change.

If the value was set to imagebuilder, the imagebuilder is used instead.

Notes

The only gotcha is that to support --build-args for imagebuilder, you need https://github.com/openshift/imagebuilder/pull/58 until it gets merged.

mumoshu commented 6 years ago

Just wrote a lengthy issue at #138 trying to explaining why I (and possibly you) need this. @rhs Would you mind reviewing this? Thanks!

rhs commented 6 years ago

Hi,

First off, I think the use case makes a lot of sense. Thanks for taking the time to describe it in such detail, and thanks for contributing a PR. I have a couple general comments/question I'll put here and then follow up with any more specific comments in the code.

I'm wondering if there are any other approaches to solving this problem you've considered? Maybe using a multi stage docker build? I ask for a couple of reasons:

1) I'm curious as I expect we will need to solve this also. ;-)

2) A big goal of forge is to work consistently across as many different environments as possible. For this reason I'm generally cautious about adding external dependencies in a way that can make forge behavior brittle/environmentally dependent. Right now the primary external dependencies forge has are kubectl and docker, which are both relatively stable/known quantities, so I have to ask the question of whether or not the use case can be reasonably accommodated without expanding the external dependency footprint. Alternatively, if it can't be, then I'd like to understand what imagebuilder stability is like, e.g. can we reasonably expect everyone to have a certain minimum version? Should we have a version check in place so if there are incompatibilities we fail fast? Is it still an early project in rapid development? (FYI, I'm not personally familiar with imagebuilder, so I'd appreciate any context you could give on these questions.)

My only other big comment is on tests. FWIW, I'm aware the tests don't run properly if you aren't in the datawire organization. I'm working on improving that, hopefully soon. I'd like to be a) easy to run the full test suite and b) easy to contribute tests. I need to finish vetting the credentials we use from CI and write a bit more documentation to make this a reality, but for now if you could supply a sample project (could be tarball/zip/etc, or a pointer to a git repo) and a simple transcript of usage along with expected output, I can turn that into a test quite easily.

mumoshu commented 6 years ago

@rhs Hi, thanks for reviewing!

I'm wondering if there are any other approaches to solving this problem you've considered?

I've updated the lengthy description at #138.

The only other approach in my mind is the one I've mentioned in 4. Unreadable, relatively complicated script to form a docker-image-build pipeline.

It is occasionally implemented w/ lengthy bash scripts and/or Makefiles chaining docker-runs and docker-builds to finally produce the docker image for your app.

Maybe using a multi stage docker build?

I've tried and then abandoned it for my business apps(note that it is still useful for easily building efficient docker images for OSS)

Please see Why not docker multi-stage builds? in #138 for more info.

In nutshell, it requires a trade-off between slow builds vs leaking secrets to image layers/metadata. IMHO this is an unnecessary trade-off and hence this PR.

mumoshu commented 6 years ago

@rhs

Some more alternatives I remember:

  1. Use https://github.com/mdsol/docker-ssh-exec

It doesn't work everywhere though.

As it requires you to have access to the host running the docker daemon running your docker-build, it won't work w/ CircleCI 2.0 and Travis CI's container executor for example. Would work w/ respective machine executor which runs your CI build on a dedicated VM, but it delays startup of the CI job in the first place. Again, IMHO this is an unnecessary trade-off. Just use client-side docker image building.

  1. Use an external secret management system like Vault

while fetching secrets from within docker-build contexts.

It would be an option if you run everything from CI&CD to apps on your own VMs.

In case you use a SaaS like CircleCI for CI, it implies that you have to directly/or indirectly(via VPN for example) expose Vault to the Internet so that your docker-build contexts running on CircleCI's infra can access the Vault. This is a big no-no for me. The less attack-surface is the better security is.

Also note that both alternatives requires you to install a command into the docker image in order to obtain secrets from the docker-ssh-exec server or Vault. It slightly complicates Dockerfile.

And again, this is an unnecessary complication IMHO. For example in imagebuilder, you can just mount a volume w/ secrets to a docker-build context, which leaves nothing in the resulting image.

mumoshu commented 6 years ago

@rhs

A big goal of forge is to work consistently across as many different environments as possible. For this reason I'm generally cautious about adding external dependencies in a way that can make forge behavior brittle/environmentally dependent.

I definitely agree and understand that.

Right now the primary external dependencies forge has are kubectl and docker, which are both relatively stable/known quantities, so I have to ask the question of whether or not the use case can be reasonably accommodated without expanding the external dependency footprint.

Ack. I'm ok with just introducing something like:

containers:
- image:
     builder:
       command: your-preferred-builder

so that forge calls your-preferred-builder -f $dockerfile -t $img $build_args $dir instead of the default docker build. The user places the docker-build compatible script/command to instruct forge to run it instead of docker-build. How about that?

Alternatively, if it can't be, then I'd like to understand what imagebuilder stability is like, e.g. can we reasonably expect everyone to have a certain minimum version? Should we have a version check in place so if there are incompatibilities we fail fast?

I wish we had one but no - it doesn't even have a semantic version yet.

Is it still an early project in rapid development? (FYI, I'm not personally familiar with imagebuilder, so I'd appreciate any context you could give on these questions.)

Honestly I'm also new to the party but AFAICS:

mumoshu commented 6 years ago

if you could supply a sample project (could be tarball/zip/etc, or a pointer to a git repo) and a simple transcript of usage along with expected output, I can turn that into a test quite easily.

You can just add the following snippet to the service.yaml of your hello-world-python example:

#snip

containers:
- image_builder: imagebuilder

#snip
mumoshu commented 6 years ago

I've implemented the alternative in https://github.com/datawire/forge/pull/137#discussion_r163450263

A config would look like:

containers:
- image:
     builder: imagebuilder

now.

If you are still not wanting imagebuilder as forge's dependency, I can also remove it and try this:

containers:
- image:
     builder:
       command: your-preferred-builder

wdyt?

rhs commented 6 years ago

Thanks for detailing the alternatives, that's super useful info to have!

I'm fine adding the imagebuilder dependency, I'm in the process of pulling it into the dev container now. If it presents a stability problem we can cross that bridge when we come to it. I'll try to get this into a release by tomorrow or maybe early next week if I hit any issues.

mumoshu commented 6 years ago

@rhs Thanks again for taking your time! Although I have not yet managed to run the whole test because it seems to require a running kubenaut env, I've added some unit tests to test_service.py and test_schema.py while fixing a regression. Hope you like it.

mumoshu commented 6 years ago

FWIW, test_schema.py and test_service.py are passing on my machine.

rhs commented 6 years ago

I added an end-to-end test with imagebuilder and merged this into master, so it should go out into the next release. Please note that I tweaked the schema a little bit. I was worried that having image: and name: would be confusing, and I wasn't sure what the point of having the extra object was, so I pulled builder: up to the top level. If I missed something here and there is good reason for the extra object then please let me know.

mumoshu commented 6 years ago

@rhs Hi, thanks for your efforts! Regarding the point, please let me clarify a bit. Not arguing, but thought it could be good for us to sync up.

Probably we have a bit different concepts in our minds referring the same terms.

For me, containers[].name sounds like its solely for container's name, irrelevant to container image name. I'd even expect that we could customize an image name regardless a container name, e.g. via containers[].image.name.

However, as I understand the forge's "convention" to name the container image the same as the container(which is a different concept than container image), it is understandable.

In my perspective though, containers[].builder is misleading. I'm not building a "container", but a "container image". That's why containers[].image.builder or containers[].image_builder sounded more natural to me.