containers / storage

Container Storage Library
Apache License 2.0
536 stars 234 forks source link

idea: implement an IPFS driver #1983

Open goern opened 1 week ago

goern commented 1 week ago

As a user of the container storage library, I want to store layers as objects on IPFS, so that I can benefit from the distributed object storage of IPFS.

Rationale for Implementing IPFS in Container Image Storage Driver

Implementing the InterPlanetary File System (IPFS) for storing container image layers offers significant advantages in a distributed computing environment. By leveraging IPFS, each container image layer can be stored as an independent object across a decentralized network. This approach allows for the assembly of container images from multiple IPFS servers using layer references, enhancing flexibility and scalability.

One of IPFS's key benefits is its pinning feature, which ensures the efficient maintenance of the location and replication of container image layers. This enhances data availability and reliability and contributes to optimized storage management. Furthermore, utilizing IPFS can lead to a more equitable distribution of storage costs, as it enables a more accurate accounting of resources used for container image storage. This feature is particularly advantageous for organizations implementing cost-effective and transparent storage solutions.

References

https://docs.ipfs.tech/concepts/

goern commented 1 week ago

Cc: @sallyom @rhatdan @vpavlin

vpavlin commented 1 week ago

Hmm, I am not sure I understand this - container/storage repo is to store images localy - or not? Using IPFS for container registry makes sense to me and there is already an implementation of that https://github.com/ipdr/ipdr

Or is the use case you have in mind more towards a Kubernetes cluster where each k8s node also is/has IPFS node running and hence images (layers) can be distributed among them without each of them pulling it from an external registry (hence potentially incuring unnecessary cost?)

goern commented 1 week ago
  1. I was under the impression that storage drivers also handle pulling blobs from remote locations?!
  2. ipdr is nice, but I think we could completely eliminate the requirement for having a registry, as we could have the manifest itself on ipfs
  3. ja, your last paragraph summarized one of the use cases.
vpavlin commented 1 week ago
  1. I was under the impression that storage drivers also handle pulling blobs from remote locations?!

Ah, maybe, I have actually no clue:D

2. ipdr is nice, but I think we could completely eliminate the requirement for having a registry, as we could have the manifest itself on ipfs

Pardon my ignorance and that I did not see this immediately - this is true! Really cool idea!

rhatdan commented 4 days ago

Containers/storage supports additional/stores and additional Layers, which can be stored on networked base storage.

When it comes to pulling images, we use containers/image.

rhatdan commented 4 days ago

@mtrmac @nalind @giuseppe @saschagrunert Thoughts?

mtrmac commented 4 days ago

I didn’t try to build this and I have no numbers, but I just can’t see any end-user benefit.


Users who don’t want to place permanent files on nodes, and want to somehow deal with IPFS-located data, can already mount an IPFS filesystem, or, I don’t know, have an in-memory-only IPFS client. It’s not necessary to change anything about the infrastructure for that.

So we are talking about container-image content.


Then https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing items 1-3: We would never want to entirely replace the local extracted-layer-filesystem storage by a distributed system, in general; the distributed systems (incl. operators that drive operation of the network components interconnecting the nodes) are built on top of of container images running on nodes!


Replacing registries (which don’t store individual layer files separately, but as compressed tarballs) by individual compute nodes acting as distributed layer stores is not obviously beneficial either. (Compare also https://github.com/spegel-org/spegel , IPFS is not inherently the only way to do that.)

First, there’s no direct benefit to co-locating the compressed layer versions and extracted layer filesystems; the two are not direct substitutes. (containerd does co-locate them, making Spegel possible at low additional cost, but c/storage does not.)

Second, if a massively-parallel application were deployed across 100 nodes, would that mean we have 100 computers storing and serving the compressed layer over this filesystem? The 100 copies would be completely unnecessary waste of storage.

So, for distributing layers, it seems to me much better to have an “ordinary” clustered registry deployment (with admin-controllable number of replicas), on top of … well, any clustered filesystem, really. That could be IPFS, or it might be not, but either way there’s no need to involve c/storage at all.


Fine; if we forget about distributing images over nodes, and just talk about not using registries at all, and having “pull” operations directly interact with some object store (IPFS or not). Sure, that is plausible — but also an ecosystem-wide feature addition:

And in the end, I don’t see that this is really any better than having a registry which, when asked for a blob, issues a HTTP redirect to a CDN (where the CDN can be backed by whatever filesystem you choose). That works today, and is widely deployed in practice.


What am I missing?

sallyom commented 4 days ago
mtrmac commented 4 days ago
  • With decentralized storage, where data is stored becomes irrelevant

It’s not really irrelevant; failure domains are something that need to be accounted for and designed. Unless the data were that excessively ubiquitous (similarly to how we think of internet packet loss as irrelevant because we have “good enough” hardware and ample extra bandwidth to retransmit). That excessive ubiquity might be true in the future but, almost certainly, isn’t true for container images at the moment.

all that matters is who can access it. Access is managed cryptographically rather than by server-side access controls.

I’m IPFS-ignorant and I don’t know what that means. Right now users have registry credentials. How does one go from registry credentials to “managed cryptographically” access?

Are you saying that anyone who knows the digest of a layer is assumed to be able to access a copy? That’s… not been an assumption in the current systems. It might be fine but it’s not obvious. E.g. Red Hat, on https://catalog.redhat.com , is publishing image manifest digests, AFAICS without requiring any login. Does that mean that if the images were stored on IPFS, that would make all of the image content public? (And if it does, “whose fault is that”?)

It is a fundamental shift, but decoupling location from access control has benefits - flexibility, resiliency, consistency.

I don’t know what that “flexibility, resiliency, consistency” means at a code level.

I am imagining a server with one or two outbound internet links, comparatively low-bandwidth and possibly congested, and a local network that is fully managed and much higher-bandwidth. Where does IPFS enter the picture? Over the congested link, or locally? If the former, that’s not sufficiently resilient. If the latter, that needs to be explicitly managed to contain everything necessary, and it’s basically “just a mirror“.

  • With centralized servers, security and access are dependent on the infrastructure. If the registry server goes down, you lose access to your images. P2P networks eliminate the single point of failure.

What is the economic model? With BitTorrent and illegally pirated movies, everyone involved has some interest in keeping the movies around, and there is maybe a bit of a reputation economy and a moral imperative to share alike between participants.

With container images … why would company A want to host a mirror of company B’s images? (Yes, I realize that model works for freely-distributable Linux distributions.)

  • If podman could access images directly from IPFS as if they were stored locally, this would eliminate the need for traditional image pulls.

Are you talking about replacing the compressed layer representation or the extracted-filesystem representation? This seems to be the latter; and as detailed above, that seems unnecessary (for application-managed data) or outright undesirable (for application files of infrastructure containers) to me.

mtrmac commented 4 days ago

(One possible response for the desire to avoid pulling entirely, and to access individual files from some external source, is that the Additional Layer Store is an out-of-process FUSE interface, and nothing prevents anyone from writing a new backend. Uh … except that the ABI of that interface has effectively changed in the last 2 months.)

mtrmac commented 4 days ago

Second, if a massively-parallel application were deployed across 100 nodes, would that mean we have 100 computers storing and serving the compressed layer over this filesystem? The 100 copies would be completely unnecessary waste of storage.

And, au contraire, if a cluster-critical operator is not performance-critical and runs in a single deployment, would that mean that we only have a single copy? Then the admins would still need to manage an explicit mirroring operation that ensures that every image has a sufficient number of replicas to achieve the desired HA properties.

I can see how “Nodes should just pull from each other without any need to manage mirrors and replicas” sounds attractive, but AFAICS the need to manage just isn’t avoidable.

giuseppe commented 4 days ago

Last time I've looked into IPFS, I've not found a way to use files from other sources, in our case from the containers storage.

That is a big disadvantage because we'd need to keep around the images twice: one copy in the containers storage and one extra copy in the IFPS cache, so effectively it doubles the amount of storage required to store an image.

vpavlin commented 1 day ago

IPFS is problematic from ecomonical perspective - as there is no explicit incentive for anyone to hold other than their own data - that is why we get centralized Pinning Gateways where you pay for a node where your CID is pinned (i.e. the content is stored). I agree that altruistic approach does not work very well here (unlinke with BitTorrent). Filecoin and in the future Codex solve these issues by adding monetary incentive for node operators to host other people data.

I'd say controlling access to data is a more complex topic in peer-to-peer networks - as the access cannot be simply "gated" by AuthN/AuthZ proxy, but it must be based on encryption - i.e. - you either have or do not have the right key to decrypt the data blob. There are existing solutions like various MPC (Multi-party Computaion) networks allowing you to prove you are eligible to decrypt (base on access rules bound to some cryptographical material - e.g. proving ownership of a particular privaty key) and then generating decryption keys for you. But yes, it complicates the system significantly:)

I feel like this issue turned quickly away from "open source and free software idea" into "enterprise solution exercise":) And I am not 100% sure whether a completely centralized enterprise entity could benfit from integrating IPFS - given the complexities of access control, extra bandwith consumption and generally just the mindset and inherent centralization of everything - the extra potential resilience might not be a big enough incentive to deal with this.

On the other hand in open source and free software world this exploration still has a significant merit - being able to access data privately, securely and without explicitly relying on a third party gateway is something that is increasingly gaining interest (at least in my social bubble:) ). Even curl added support for IPFS:)

It would be an interesting opt-in feature where I don't have to rely on a centralized registry, but communicate directly with other nodes running podman and serve image layers I personally use as well. Maybe there are layers marked as private which are not broadcasted on the p2p network as they were pulled from a private registry? Maybe if some layer is not found on the p2p network, it can be pulled from a centralized registry and then broadcasted (if not private)? Maybe IPFS is not the right solution - it is the most commonly used, but with the lack of any assurance of persitency of the data in the network, it might not be feasible for any real-world deployment?

goern commented 1 day ago

I think vasek is heading the right direction, this request is not only about a new storage driver/technology and technical use cases we should solve. its about an update to the 'container image storage ecosystem'.

mtrmac commented 1 day ago

Ah, the C word. Have fun, but keep me out, and don’t @ me.


Container image encryption exists, but it has significant limitations right now (it doesn’t encrypt the image config, and it doesn’t really work for digest references); also deploying it is harder because the ecosystem (e.g. K8s Pod objects) is set up to distribute credentials, not keys.


I think “being able to access data privately … without explicitly relying” is completely disconnected from the nature of container images, which are opaque blobs on the order of hundreds of megabytes, practically impossible to black-bock inspect WRT what the binary does, and inherently imply a trust relationship to the producer of the images, WRT the existence, identity, and trustworthiness of that producer. (And/or reproducible builds, I guess. Still a trust relationship to whoever makes the reproducibility claim, because image users are not going to be re-building the image on every use.)

cgwalters commented 1 day ago

(One possible response for the desire to avoid pulling entirely, and to access individual files from some external source, is that the Additional Layer Store is an out-of-process FUSE interface, and nothing prevents anyone from writing a new backend. Uh … except that the ABI of that interface has effectively changed in the last 2 months.)

Doesn't have to be FUSE, just any mountable network filesystem, right? At least one of my employer's important customers is doing this with an AFS mounted additional image store, for reference. One notable thing here though is that what we really want with something like this is a mechanism to tell the underlying filesystem that these aren't files, they're objects and hence can be cached for any arbitrary lifetime without worrying about cache invalidation. But I think today people doing this are OK with fscache.

vpavlin commented 22 hours ago

Ah, the C word. Have fun, but keep me out, and don’t @ me.

the C word? Cryptography? Cypherpunk? Centralised? Control? Sorry, I am lost here, can you please help me understand?

and inherently imply a trust relationship to the producer of the images

Correct, we are in agreement here - I never said you are not relying on the producer or that anything decentralized is trustless - trust is always involved. The goal though is to move the trust away from intermediaries. I said without explicitly relying on a third party gateway - i.e. I produce something, push it to a registry and then pull it elsewhere.

The thing we are talking about is to avoid pushing to a registry. And again, happy to reiterate - this is not for everyone, I did not even come up with this idea originally, but AFAU @goern's goal was to investigate if such a system would be possible, feasible and useful to some.

mtrmac commented 10 hours ago

(One possible response for the desire to avoid pulling entirely, and to access individual files from some external source, is that the Additional Layer Store is an out-of-process FUSE interface

Doesn't have to be FUSE, just any mountable network filesystem, right? At least one of my employer's important customers is doing this with an AFS mounted additional image store

It’s confusing, but Additional Layer Store and Additional Image Store are two quite different c/storage features. With e.g. https://github.com/containers/storage/blob/52b643e1ff51ae0a05693adf8fae5a134b32b478/drivers/overlay/overlay.go#L2661 , it seems to me that FUSE, or some other custom not-just-content filesystem, it the only way to operate ALS, at least without triggering warnings during normal operation. But I didn’t look too deeply.

mtrmac commented 10 hours ago

Ah, the C word. Have fun, but keep me out, and don’t @ me.

the C word?

Cryptocurrency.

The goal though is to move the trust away from intermediaries.

That’s what end-to-end signatures provide. They exist today.