confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
48 stars 86 forks source link

Container images are pulled twice #371

Open bpradipt opened 1 year ago

bpradipt commented 1 year ago

Container images are getting pulled twice - on the worker node as well as inside the pod VM. Image getting pulled inside the pod VM is by design.

However image getting pulled on the worker node is not desirable. Creating this issue as a tracker.

@littlejawa @snir911 @yoheiueda

littlejawa commented 1 year ago

In my understanding, the workflow is:

The runtime (kata in our case) is not involved in all this. This is all handled inside containerd and crio. The only way to prevent the image from being downloaded on the worker node is then to modify their behaviour to ignore or relay the PullImageRequest.

The only alternative I can imagine is if we can find a way to convince kubelet to NOT send the PullImageRequest. I don't know how the decision is made on kubelet's side though, and it's probably going to be hard to pretend an image is there for peerpods, and not for other runtimes. :-/

bpradipt commented 1 year ago

Am I correct in interpreting that once proper offload support for PullImage is added to containerd and crio for Kata, we won't have this issue ?

snir911 commented 1 year ago

I think this is correct (once it will be supported and we'll remove the explicit image pull call from caa)

littlejawa commented 1 year ago

Am I correct in interpreting that once proper offload support for PullImage is added to containerd and crio for Kata, we won't have this issue ?

Yes, exactly. When crio/containerd will relay the PullImageRequests, they won't be downloading on the node anymore, and everything will be done on the agent side.

And yes, as @snir911 says, we will have to remove the call from caa, otherwise it will be done twice on the agent side :-)

surajssd commented 1 year ago

Correct me if I am wrong, this could be a problem on normal kata-cc as well right? Unless the shim is doing some magic that remote shim does not do yet?

huoqifeng commented 1 year ago

I think image pull get called twice but only the 2nd pull takes effect. The 1st case is https://github.com/confidential-containers/cloud-api-adaptor/blob/staging/pkg/adaptor/proxy/service.go#L211, which comes from CRI PullImage API but did nothing. The 2nd case is https://github.com/confidential-containers/cloud-api-adaptor/blob/staging/pkg/adaptor/proxy/service.go#L130, which comes from CRI CreateContainer API and takes effect. Please correct me if I'm wrong.

littlejawa commented 1 year ago

I think image pull get called twice but only the 2nd pull takes effect. The 1st case is https://github.com/confidential-containers/cloud-api-adaptor/blob/staging/pkg/adaptor/proxy/service.go#L211, which comes from CRI PullImage API but did nothing. The 2nd case is https://github.com/confidential-containers/cloud-api-adaptor/blob/staging/pkg/adaptor/proxy/service.go#L130, which comes from CRI CreateContainer API and takes effect. Please correct me if I'm wrong.

I think the issue is that the first one is not called at all - the PullImage request from CRI is not forwarded to the shim, and then does not come to cloud-api-adaptor. The code to do that in containerd was not merged upstream, as far as I understand, so containerd just proceeds the CRI PullImage by itself, which is why it's pulled once (on the worker node) before CreateContainer comes and the second PullImage is done on the remote agent.

This issue is a side effect of not having the forwarding.