containers / image

Work with containers' images
Apache License 2.0
861 stars 375 forks source link

Decouple library from system's state #293

Closed aybabtme closed 5 years ago

aybabtme commented 7 years ago

Hey there,

When attempting to use this library, I realize that it isn't very "pure" in how it works:

A few examples: OCI image references make direct use of package os to read/write files and create directories, without exposing a clear way of changing where that happens (hardcoded expectations about file locations). The ostree package does a lot of fork/exec'ing with user-provided strings. Package openshift looks up environment variables.

This makes a controlled (and safe!) usage of the library very difficult, since it comes with a lot of baggage and expectations about what my system looks like, how my application is running and how my it works. It also uses what eventually user input to shell out to commands, which is a huge security risk.

Meanwhile, I have the expectation that a package that knows how to inspect and manipulate image formats and talk with various registries should be able to do all this in somewhat "pure" ways.

Btw, I don't mean to just throw shade on the project, I just want to know if y'all are in agreement with what I'm saying and would be interested in patches that try to decouple things up a bit more.

runcom commented 7 years ago

I would be super happy to have patches for this, I kind of agree with you on every point :)

mtrmac commented 7 years ago

I’d very much like to hear what exactly are you trying to build, instead of discussing this in abstract.

In rough points, my current thinking (open to change):

So, again, what specifically does your application need to do, or what risks does it want to avoid?

cyphar commented 7 years ago

Just quickly chiming in with my $0.02.

  • it writes to stderr and stdout
  • it logs (usage of logrus, related to previous point)

logrus allows you to specify programmatically the output buffer for logs. Personally I want libraries to give logs because otherwise debugging them is almost impossible without recompiling source or spending a long time trying to figure out where a bug might be. There is a valid point to be made about errors being logged inside a library, but that's not enough of an argument to say that logging should not be allowed.

it reads config files from the filesystem it reads environment variables to enable/disable features.

I agree with these points.

it writes files to the filesystem, instead of taking io.Writer or a pluggable "filesystem" interface. it creates directories on the filesystem

This is actually a language problem, not a fault of this library. Inside umoci I've spent a lot of time trying to make it possible to swap out the filesystem interface (to implement unprivileged operations through userspace emulation of CAP_DAC_OVERRIDE). It's very difficult because Go's filesystem API is very spread out and it's difficult to swap out the internal implementation of the standard library if it uses filesystem interfaces directly (like filepath.Walk).

In addition, the OCI format is defined as a directory with files inside it. So you can't just have an io.Writer you need to have a full filesystem interface. And while you can just hack one together, it makes the interfaces significantly more ugly without a very clear benefit. I do actually agree that it would be great if you could swap out the filesystem APIs, but it's much harder to solve than it might initially sound.

it fork/exec's processes (tar, ostree and such)

I don't believe this is true for most of the things in this library, but it's not clear what we should do instead if there isn't a library version of a thing we're using. Should we reimplement it in Go? I understand this concern from a purist point of view (hell, I hate libraries that spin up threads behind my back too -- though the concern there is not shared with shelling out to processes). But I think this is a bit extreme.

it uses networked clients that are not pluggable, or that have hardcoded URLs

The network client stuff is the same issue as the filesystem stuff, it's non-trivial to make them pluggable while also maintaining an API that is agnostic to the type of the endpoints. As for hardcoded URLs, the Docker distribution API basically requires you to have hardcoded URLs if you want to match Docker's semantics.

runcom commented 7 years ago

it reads config files from the filesystem it reads environment variables to enable/disable features. I agree with these points.

So, by setting the configuration in the appropriate SystemContext, these auto-reads files and envs is disabled. We should document this further.

it uses networked clients that are not pluggable, or that have hardcoded URLs The network client stuff is the same issue as the filesystem stuff, it's non-trivial to make them pluggable while also maintaining an API that is agnostic to the type of the endpoints. As for hardcoded URLs, the Docker distribution API basically requires you to have hardcoded URLs if you want to match Docker's semantics.

right, we don't provide generic clients. The docker client for instance doesn't make sense to have urls passed down, the library itself provides those urls to talk to a registry

mtrmac commented 7 years ago

So, by setting the configuration in the appropriate SystemContext, these auto-reads files and envs is disabled.

Most importantly note RootForImplicitAbsolutePaths.

OTOH the general caveat to SystemContext is that any time new functionality is added (e.g. recently /etc/docker/certs.d support) some applications may want to customize even more than before. if they want to customize stuff to be mostly different from the system configuration.

Hence I am very curious to hear in which cases an application would want to extensively diverge from the system configuration; it's clear enough that an application would want to override a specific individual item (e.g. to provide a password or a private key), but are there ever cases when an application would need to override everything or almost everything? What are they?

rhatdan commented 5 years ago

@aybabtme @mtrmac @vrothberg What is the state of this issue. It is two years old, can we close it?

vrothberg commented 5 years ago

I think we can close it since the use-cases motivating the issue remain unclear. Feel free to reopen, if necessary.