checkpoint-restore / criu

Checkpoint/Restore tool
criu.org
Other
2.86k stars 576 forks source link

Checkpointing a container running in a k8s pod #2241

Closed neskandani closed 1 year ago

neskandani commented 1 year ago

Hi,

I am trying to checkpoint a container running in a pod using the following command: sudo curl -X POST "https://localhost:10250/checkpoint/default/pod_name/container_name" --insecure --key /etc/kubernetes/pki/apiserver-kubelet-client.pem --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt. I was expecting to get a zip file containing the images generated by CRIU.. Unfortunately, I get the "404 page not found" error. I also tried the URL with "pod_ID and container_ID", and "pod_name and container_ID" combinations, but got the same error as before.

I am running the experiments on a k8s cluster(v1.27.4) with two Ubuntu22.04.2 nodes (kernel version: 5.15.0-1042-azure). The container runtime of the nodes is cri-o://1.26.3. I have also enabled the CRIU support for the runtime (enable_criu_support = true) and enabled the container checkpoint feature gate for the kube-apiserver (--feature-gates=ContainerCheckpoint=true).

rst0git commented 1 year ago

@neskandani Could you confirm that pod_name and container_name are correct and that you are running curl on the node where the Pod is running?

For example, you can use the following scripts to show information about all Pods and containers in your cluster:

kubectl get pods --all-namespaces -o wide

pods=$(kubectl get pods --all-namespaces --field-selector 'metadata.namespace!=kube-system' -o jsonpath='{range .items[*]}{.metadata.namespace} {.metadata.name}{"\n"}{end}')

while read -r pod; do
    namespace=$(echo "$pod" | awk '{print $1}')
    pod_name=$(echo "$pod" | awk '{print $2}')

    # Get the containers in the pod
    containers=$(kubectl get pod -n "$namespace" "$pod_name" -o jsonpath='{range .spec.containers[*]}{.name}{"\n"}{end}')

    # Print pod information
    echo "Pod: $pod_name"
    echo "Namespace: $namespace"
    echo "Containers:"

    # Loop through each container in the pod
    while read -r container; do
    echo "- $container"
    done <<< "$containers"

    echo "---------------------"
done <<< "$pods"
rst0git commented 1 year ago

I have also enabled the CRIU support for the runtime (enable_criu_support = true) and enabled the container checkpoint feature gate for the kube-apiserver (--feature-gates=ContainerCheckpoint=true).

Note that you need to enable the ContainerCheckpoint feature gate for the kubelet.

output_file="/etc/default/kubelet"
echo 'KUBELET_EXTRA_ARGS="--feature-gates=ContainerCheckpoint=true --anonymous-auth=true --authorization-mode=AlwaysAllow"' | sudo tee "$output_file" >/dev/null
sudo systemctl restart kubelet.service
neskandani commented 1 year ago

@neskandani Could you confirm that pod_name and container_name are correct and that you are running curl on the node where the Pod is running?

For example, you can use the following scripts to show information about all Pods and containers in your cluster:

kubectl get pods --all-namespaces -o wide
pods=$(kubectl get pods --all-namespaces --field-selector 'metadata.namespace!=kube-system' -o jsonpath='{range .items[*]}{.metadata.namespace} {.metadata.name}{"\n"}{end}')

while read -r pod; do
  namespace=$(echo "$pod" | awk '{print $1}')
  pod_name=$(echo "$pod" | awk '{print $2}')

  # Get the containers in the pod
  containers=$(kubectl get pod -n "$namespace" "$pod_name" -o jsonpath='{range .spec.containers[*]}{.name}{"\n"}{end}')

  # Print pod information
  echo "Pod: $pod_name"
  echo "Namespace: $namespace"
  echo "Containers:"

  # Loop through each container in the pod
  while read -r container; do
  echo "- $container"
  done <<< "$containers"

  echo "---------------------"
done <<< "$pods"

Yes, I am running the command on the node where the pod is running. The pod name and container name are also correct: image

neskandani commented 1 year ago

I have also enabled the CRIU support for the runtime (enable_criu_support = true) and enabled the container checkpoint feature gate for the kube-apiserver (--feature-gates=ContainerCheckpoint=true).

Note that you need to enable the ContainerCheckpoint feature gate for the kubelet.

output_file="/etc/default/kubelet"
echo 'KUBELET_EXTRA_ARGS="--feature-gates=ContainerCheckpoint=true --anonymous-auth=true --authorization-mode=AlwaysAllow"' | sudo tee "$output_file" >/dev/null
sudo systemctl restart kubelet.service

Thanks for the suggestion. I ran the commands, still getting the page not found error.

neskandani commented 1 year ago

I have also enabled the CRIU support for the runtime (enable_criu_support = true) and enabled the container checkpoint feature gate for the kube-apiserver (--feature-gates=ContainerCheckpoint=true).

Note that you need to enable the ContainerCheckpoint feature gate for the kubelet.

output_file="/etc/default/kubelet"
echo 'KUBELET_EXTRA_ARGS="--feature-gates=ContainerCheckpoint=true --anonymous-auth=true --authorization-mode=AlwaysAllow"' | sudo tee "$output_file" >/dev/null
sudo systemctl restart kubelet.service

Running these commands on all the nodes in the cluster did the job! Thanks a lot! :)