Open stevenhorsman opened 4 days ago
Looking in the kata-agent log it has the info
message
{"msg":"pull image \"docker.io/library/nginx@sha256:9700d098d545f9d2ee0660dfb155fe64f4447720a0a763a93f2cf08997227279\", bundle path \"/run/kata-containers/3a9d18335128ca98c7d1f9d86aaad6922c063eeff135ab977ea164fa5ff60dcf/images\"","level":"INFO","ts":"2024-06-26T13:08:03.03219308Z","name":"kata-agent","subsystem":"image","source":"agent","pid":"810","version":"0.1.0"}
But we never get anything back from image-rs's pull image and then after 60s container fails with context deadline exceeded
. Unfortunately image-rs doesn't seem to have any logging, so I'm not sure how to get more information on what is going wrong 😞
Looking in the kata-agent log it has the
info
message{"msg":"pull image \"docker.io/library/nginx@sha256:9700d098d545f9d2ee0660dfb155fe64f4447720a0a763a93f2cf08997227279\", bundle path \"/run/kata-containers/3a9d18335128ca98c7d1f9d86aaad6922c063eeff135ab977ea164fa5ff60dcf/images\"","level":"INFO","ts":"2024-06-26T13:08:03.03219308Z","name":"kata-agent","subsystem":"image","source":"agent","pid":"810","version":"0.1.0"}
But we never get anything back from image-rs's pull image and then after 60s container fails with
context deadline exceeded
. Unfortunately image-rs doesn't seem to have any logging, so I'm not sure how to get more information on what is going wrong 😞
If it's using in-guest image pull, then can you try increasing the remote hypervisor timeout and the container create container timeout - https://github.com/kata-containers/kata-containers/blob/main/src/runtime/config/configuration-remote.toml.in#L298 ?
If it's using in-guest image pull, then can you try increasing the remote hypervisor timeout and the container create container timeout - https://github.com/kata-containers/kata-containers/blob/main/src/runtime/config/configuration-remote.toml.in#L298 ?
Yeah, that's a good idea, but just pulling nginx shouldn't take more that 60s and in the past when I've seen the timeout it's only been on the containerd side, so the kata-agent has still come back for the image pull afterwards, which doesn't seem to be happening here.
Okay - I stand corrected. It appears that the nginx pull took over 2mins:
Jun 26 13:51:30 podvm-nginx-55954c7c66-vptr5-bc08413b kata-agent[811]: {"msg":"pull image \"docker.io/library/nginx@sha256:9700d098d545f9d2ee0660dfb155fe64f4447720a0a763a93f2cf08997227279\", bundle path \"/run/kata-containers/3c0fc9e0c3634183117f4078d7be48cd3fbb70a8ecc0ea4243cf7cbdf5613aff/images\"","level":"INFO","ts":"2024-06-26T13:51:30.399304775Z","version":"0.1.0","name":"kata-agent","pid":"811","source":"agent","subsystem":"image"}
...
Jun 26 13:53:44 podvm-nginx-55954c7c66-vptr5-bc08413b kata-agent[811]: {"msg":"pull and unpack image \"sha256:dd6c8d4a8748039368f97fd52156d3fadf0ee481dc97d3063d74d9bc38681757\", cid: \"3c0fc9e0c3634183117f4078d7be48cd3fbb70a8ecc0ea4243cf7cbdf5613aff\" succeeded.","level":"INFO","ts":"2024-06-
26T13:53:44.042495884Z","name":"kata-agent","version":"0.1.0","pid":"811","source":"agent","subsystem":"image"}
So I might not have waited long enough, or the containerd request cancelled it or something? So we have an ibmcloud performance issue, rather than functional one. Thanks for nudging me into trying the timeout Pradipta!
When creating an ibmcloud set up on with a self-managed cluster with both s390x and amd64 architectures, the tests fail.
The pod describe looks like:
and CAA log shows and error during the CreateContainer (which includes the pull image step):
I need to dig into the kata-agent logs and see if there is any more information about this.