Open bpradipt opened 1 year ago
In my understanding, the workflow is:
The runtime (kata in our case) is not involved in all this. This is all handled inside containerd and crio. The only way to prevent the image from being downloaded on the worker node is then to modify their behaviour to ignore or relay the PullImageRequest.
The only alternative I can imagine is if we can find a way to convince kubelet to NOT send the PullImageRequest. I don't know how the decision is made on kubelet's side though, and it's probably going to be hard to pretend an image is there for peerpods, and not for other runtimes. :-/
Am I correct in interpreting that once proper offload support for PullImage is added to containerd and crio for Kata, we won't have this issue ?
I think this is correct (once it will be supported and we'll remove the explicit image pull call from caa)
Am I correct in interpreting that once proper offload support for PullImage is added to containerd and crio for Kata, we won't have this issue ?
Yes, exactly. When crio/containerd will relay the PullImageRequests, they won't be downloading on the node anymore, and everything will be done on the agent side.
And yes, as @snir911 says, we will have to remove the call from caa, otherwise it will be done twice on the agent side :-)
Correct me if I am wrong, this could be a problem on normal kata-cc as well right? Unless the shim is doing some magic that remote shim does not do yet?
I think image pull get called twice but only the 2nd pull takes effect.
The 1st case is https://github.com/confidential-containers/cloud-api-adaptor/blob/staging/pkg/adaptor/proxy/service.go#L211, which comes from CRI PullImage
API but did nothing.
The 2nd case is https://github.com/confidential-containers/cloud-api-adaptor/blob/staging/pkg/adaptor/proxy/service.go#L130, which comes from CRI CreateContainer
API and takes effect.
Please correct me if I'm wrong.
I think image pull get called twice but only the 2nd pull takes effect. The 1st case is https://github.com/confidential-containers/cloud-api-adaptor/blob/staging/pkg/adaptor/proxy/service.go#L211, which comes from CRI
PullImage
API but did nothing. The 2nd case is https://github.com/confidential-containers/cloud-api-adaptor/blob/staging/pkg/adaptor/proxy/service.go#L130, which comes from CRICreateContainer
API and takes effect. Please correct me if I'm wrong.
I think the issue is that the first one is not called at all - the PullImage request from CRI is not forwarded to the shim, and then does not come to cloud-api-adaptor. The code to do that in containerd was not merged upstream, as far as I understand, so containerd just proceeds the CRI PullImage by itself, which is why it's pulled once (on the worker node) before CreateContainer comes and the second PullImage is done on the remote agent.
This issue is a side effect of not having the forwarding.
Container images are getting pulled twice - on the worker node as well as inside the pod VM. Image getting pulled inside the pod VM is by design.
However image getting pulled on the worker node is not desirable. Creating this issue as a tracker.
@littlejawa @snir911 @yoheiueda