Feature proposal: Services whose source/Dockerfile is not at the root of the repo

camjackson commented 6 years ago

Suppose I have a project like this:

my-project/
  frontend/
    src/
    Dockerfile
    package.json
  backend/
    src/
    Dockerfile
    package.json

They're two 'services' which run independently of each other, each with its own source code, Dockerfile, build process etc. But because there's a high degree of coupling between them (a web frontend that consumes a RESTful API in this case), it's convenient for them to live in the same repo.

What I'd like Cage to support is:

These are specified as two separate services with independent configuration
The build field for the two of them points to the same git repo
There is some way to specify that the source code for each service lives in a certain subdirectory of the git repo
cage source mount my-project would mount the source for both services, same as the existing behaviour for services that share a repo
For the source volume, the host path would be src/my-project/[frontend|backend].

I can think of two ways to specify this extra piece of information:

Tacked onto the end of the git URL, which is how Terraform supports a similar feature. However, the docker-compose docs on specifying build as a git URL are somewhat limited, so I'm not sure whether something like this would be compatible.
- Specify it as a label, similar to io.fdy.cage.srcdir. E.g. io.fdy.cage.repo_subdir.

What do you think of this as a feature? It is a good/bad/terrible idea?

camjackson commented 6 years ago

OK here's a quick prototype before I go to bed. I haven't tested this, manually or otherwise: https://github.com/faradayio/cage/compare/master...camjackson:repo_subdir?expand=1

camjackson commented 6 years ago

Ok I've just tested this out and it actually worked, first try! 🎉

To see how this looks, here is a repo containing two services (backend/frontend), and here is a pod using the io.fdy.cage.repo_subdir label.

Ok so this is definitely possible, and in fact not too difficult to implement. I guess the more relevant question is whether or not it's a good idea, and if so, whether it's a sensible design.

emk commented 6 years ago

I think the "standard" way to handle this in docker-compose is using the dockerfile: parameter, maybe?

dockerfile: subdir/Dockerfile

It might make sense to extract subdir from there? I honestly haven't thought this through.

In general, if there's a way to do something using standard Docker syntax, we're very happy to support it. But adding new io.fdy.cage.* labels needs to meet a much higher design standard. So is there any way to support this in a Docker-compatible fashion?

camjackson commented 6 years ago

Yeah I agree, adding new labels is definitely the easy way out, and not the right way when it can be avoided. I'll experiment with dockerfile to see if that could work. Thanks!

emk commented 6 years ago

Ah, wait. It looks like dockerfile doesn't do what we want. Sorry. :-(

It looks like we would need to figure out that syntax for in-repo paths in Git URLs, or something like that, for maximum compatibility. Maybe it's worth experimenting with docker-compose.yml and seeing if anything works / is documented on some obscure issue?

camjackson commented 6 years ago

Yeah I was coming to same conclusion about dockerfile.

I had a pretty good look around for how to specify a subdirectory in a build git url and turned up nothing, but I didn't actually try anything. I'll have a play.

At the moment I'm struggling to get cage build or even docker-compose build to work with a git URL at all. It always gives me 'Host key verification failed.'. It looks like this whole feature might be totally broken at the moment in DC? 😕

camjackson commented 6 years ago

OK I found it! After digging through the source code for docker-compose, and then the underlying docker-py library, I eventually figured out that Docker itself is responsible for figuring out what to do with git URLs. So it's not actually a docker-compose-specific thing, which is why I couldn't find it previously.

Anyway, the relevant docs are here. In short, the correct syntax is myrepo.git#mybranch:myfolder, or just myrepo.git#:myfolder if the branch is master.

So with this, we could stay compatible with the underlying docker-compose/Docker configuration, and avoid adding another label. Should I go ahead and try parsing a subdirectory out of the build URL?

emk commented 6 years ago

Yes, this would be an excellent and 100% acceptable solution!

In general, any time we can implement a feature by improving our support for docker-compose and docker features, it's got a 95% chance of being OKed very quickly. :-) It's the stuff where we do something different from Docker that we need to spend a lot more time considering the design.

Even internally at Faraday, there's a constant temptation to add one-off features to cage to solve immediate problems, but we've learned that that is generally a poor idea in the long run.

And thank you for spear-heading this!

(Oh, and good news: Amazon ECS now supports Docker 17.06, which means that we can finally use --build-arg in FROM lines, and we have access to multi-stage builds. And need both badly at Faraday, so I'm going to want to look into these in cage during the next few months. There's a bunch of features we could add to cage that would make it much easier to work with compiled apps, static linking, and build args, but I need to get experience with those features on a real project first.)

camjackson commented 6 years ago

Oh cool! I'm not using ECS for anything at the moment, but those features would definitely be useful in Cage!

emk commented 6 years ago

Yeah, we have a mix of ECS and Kubernetes on the backend, but we usually wait until new Docker features are deployable on ECS to really start experimenting with them. And we like to experiment with new features manually before baking them into cage. Basically, cage is a collection of "best practices" or at least "plausibly reasonable practices."

camjackson commented 6 years ago

Sounds like a good tag line: "Cage: Plausibly Reasonable Practices" 😆

I've made some basic progress on this. A quick question before I finish up for the night - do you think that the logic for parsing the directory from the URL belongs in compose_yml? I was going to implement it directly in cage, but it feels like maybe it should be a helper method on dc::GitUrl?

emk commented 6 years ago

Yeah, definitely in dc::GitUrl. The goal of compose_yml is to be the sole authority for everything related to parsing docker-compose.yml files.

This is another lesson we learned the hard way: It's tempting to let docker-compose.yml logic proliferate across a half-dozen scripts that kinda-sorta understand what's going on. But it really all belongs in one, centralized place that's shared by all tools. For cage, compose_yml is that place. :-)

faradayio / cage

Feature proposal: Services whose source/Dockerfile is not at the root of the repo #81