fabric8io / kubernetes-client

Java client for Kubernetes & OpenShift
http://fabric8.io
Apache License 2.0
3.4k stars 1.46k forks source link

Uploading a file in an init-container is very slow #5782

Open loicmathieu opened 6 months ago

loicmathieu commented 6 months ago

Describe the bug

Uploading a file in an init-container is very slow, it took more than 15s (Minikube kind to launch a Kuberenes cluster).

I send a little reproducer project, in this project there is an init-container and a container. If I try to upload a file in the container, it is done almost instantly, if I try to do the same in the init-container it took more than 15s.``

Fabric8 Kubernetes Client version

SNAPSHOT

Steps to reproduce

The following example can be used:

import io.fabric8.kubernetes.api.model.*;
import io.fabric8.kubernetes.client.KubernetesClient;
import io.fabric8.kubernetes.client.KubernetesClientBuilder;

import java.io.File;
import java.io.IOException;
import java.util.UUID;

public class Main {
    public static void main(String[] args) throws InterruptedException, IOException {
        PodSpec spec = new PodSpecBuilder()
            .addToContainers(new ContainerBuilder().withName("shell").withImage("busybox").withCommand("sleep", "infinity").build())
            .addToInitContainers(
                new ContainerBuilder().withName("init").withImage("busybox").withCommand("sleep", "30").build()
            )
            .build();
        ObjectMeta metadata = new ObjectMetaBuilder().withName(UUID.randomUUID().toString().toLowerCase()).build();

        try (KubernetesClient client = new KubernetesClientBuilder().build()) {
            var ref = client
                .pods()
                .inNamespace("default")
                .resource(new PodBuilder()
                    .withMetadata(metadata)
                    .withSpec(spec)
                    .build()
                );

            ref.create();

            // wait for the pod to be ready
            Thread.sleep(1000);

            // upload a file
            File file = new File("/tmp/test.txt");
            file.createNewFile();
            ref
                .inContainer("init")
                .file(file.getAbsolutePath())
                .upload(file.toPath());
        }
    }
}

This code runs in 18s locally using a Kubernetes cluster with Minikube.

If you remove the init container and change the code to upload the same file in the regular container, the code runs in 3s.

Expected behavior

Uploaded a file is as quick in an init-container than in a regular container.

Runtime

minikube

Kubernetes API Server version

1.25.3@latest

Environment

Linux

Fabric8 Kubernetes Client Logs

17:17:16.161 [main] DEBUG io.fabric8.kubernetes.client.Config -- Trying to configure client from Kubernetes config...
17:17:16.164 [main] DEBUG io.fabric8.kubernetes.client.Config -- Found for Kubernetes config at: [/home/loic/.kube/config].
17:17:16.479 [main] DEBUG io.fabric8.kubernetes.client.utils.HttpClientUtils -- Using httpclient io.fabric8.kubernetes.client.okhttp.OkHttpClientFactory factory
17:17:18.166 [main] DEBUG io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer -- Ready to run resync and reflector for v1/namespaces/default/pods with resync 0
17:17:18.167 [main] DEBUG io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer -- Resync skipped due to 0 full resync period for v1/namespaces/default/pods
17:17:18.190 [OkHttp Dispatcher] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Listing items (1) for v1/namespaces/default/pods at v46807
17:17:18.190 [OkHttp Dispatcher] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Starting watcher for v1/namespaces/default/pods at v46807
17:17:18.192 [OkHttp Dispatcher] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Watching https://192.168.49.2:8443/api/v1/namespaces/default/pods?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dc4305bd7-cea8-4bf5-928d-8bee419f6ef0&resourceVersion=46807&timeoutSeconds=600&watch=true...
17:17:18.255 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener -- WebSocket successfully opened
17:17:18.256 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Watch started for v1/namespaces/default/pods
17:17:18.955 [-1971283804-pool-1-thread-1] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Event received MODIFIED Pod resourceVersion v46819 for v1/namespaces/default/pods
17:17:23.179 [main] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Stopping watcher for v1/namespaces/default/pods at v46819
17:17:23.179 [main] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Force closing the watch
17:17:23.179 [main] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Closing the current watch
17:17:23.181 [-1971283804-pool-1-thread-1] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Watch gracefully closed for v1/namespaces/default/pods
17:17:23.181 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener -- WebSocket close received. code: 1000, reason: 
17:17:23.181 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Ignoring already closed/closing connection
17:17:23.183 [main] DEBUG io.fabric8.kubernetes.client.utils.internal.PodOperationUtil -- Timed out waiting for Pod to become Ready: Timed out waiting for [5000] milliseconds for [Pod] with name:[c4305bd7-cea8-4bf5-928d-8bee419f6ef0] in namespace [null]., will still proceed
17:17:23.270 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.ExecWebSocketListener -- Exec Web Socket: On Close with code:[1000], due to: []
17:17:23.271 [-1971283804-pool-1-thread-1] DEBUG io.fabric8.kubernetes.client.dsl.internal.ExecWebSocketListener -- Exec Web Socket: completed with a null exit code - no status was received prior to onClose
17:17:23.272 [main] DEBUG io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer -- Ready to run resync and reflector for v1/namespaces/default/pods with resync 0
17:17:23.272 [main] DEBUG io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer -- Resync skipped due to 0 full resync period for v1/namespaces/default/pods
17:17:23.277 [OkHttp Dispatcher] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Listing items (1) for v1/namespaces/default/pods at v46824
17:17:23.278 [OkHttp Dispatcher] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Starting watcher for v1/namespaces/default/pods at v46824
17:17:23.278 [OkHttp Dispatcher] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Watching https://192.168.49.2:8443/api/v1/namespaces/default/pods?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dc4305bd7-cea8-4bf5-928d-8bee419f6ef0&resourceVersion=46824&timeoutSeconds=600&watch=true...
17:17:23.330 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener -- WebSocket successfully opened
17:17:23.330 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Watch started for v1/namespaces/default/pods
17:17:28.273 [main] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Stopping watcher for v1/namespaces/default/pods at v46824
17:17:28.273 [main] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Force closing the watch
17:17:28.273 [main] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Closing the current watch
17:17:28.273 [-1971283804-pool-1-thread-1] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Watch gracefully closed for v1/namespaces/default/pods
17:17:28.273 [main] DEBUG io.fabric8.kubernetes.client.utils.internal.PodOperationUtil -- Timed out waiting for Pod to become Ready: Timed out waiting for [5000] milliseconds for [Pod] with name:[c4305bd7-cea8-4bf5-928d-8bee419f6ef0] in namespace [null]., will still proceed
17:17:28.274 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener -- WebSocket close received. code: 1000, reason: 
17:17:28.274 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Ignoring already closed/closing connection
17:17:28.390 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.ExecWebSocketListener -- exec message received 5 bytes on channel stdOut
17:17:28.391 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.ExecWebSocketListener -- exec message received on channel errorChannel: {"metadata":{},"status":"Success"}
17:17:28.395 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.ExecWebSocketListener -- Exec Web Socket: On Close with code:[1000], due to: []
17:17:28.395 [main] DEBUG io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer -- Ready to run resync and reflector for v1/namespaces/default/pods with resync 0
17:17:28.395 [main] DEBUG io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer -- Resync skipped due to 0 full resync period for v1/namespaces/default/pods
17:17:28.401 [OkHttp Dispatcher] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Listing items (1) for v1/namespaces/default/pods at v46827
17:17:28.401 [OkHttp Dispatcher] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Starting watcher for v1/namespaces/default/pods at v46827
17:17:28.401 [OkHttp Dispatcher] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Watching https://192.168.49.2:8443/api/v1/namespaces/default/pods?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dc4305bd7-cea8-4bf5-928d-8bee419f6ef0&resourceVersion=46827&timeoutSeconds=600&watch=true...
17:17:28.451 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener -- WebSocket successfully opened
17:17:28.451 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Watch started for v1/namespaces/default/pods
17:17:33.396 [main] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Stopping watcher for v1/namespaces/default/pods at v46827
17:17:33.397 [main] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Force closing the watch
17:17:33.397 [main] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Closing the current watch
17:17:33.397 [main] DEBUG io.fabric8.kubernetes.client.utils.internal.PodOperationUtil -- Timed out waiting for Pod to become Ready: Timed out waiting for [5000] milliseconds for [Pod] with name:[c4305bd7-cea8-4bf5-928d-8bee419f6ef0] in namespace [null]., will still proceed
17:17:33.397 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener -- WebSocket close received. code: 1000, reason: 
17:17:33.397 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager -- Ignoring already closed/closing connection
17:17:33.397 [-1971283804-pool-1-thread-1] DEBUG io.fabric8.kubernetes.client.informers.impl.cache.Reflector -- Watch gracefully closed for v1/namespaces/default/pods
Disconnected from the target VM, address: 'localhost:33285', transport: 'socket'
17:17:33.512 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.ExecWebSocketListener -- exec message received on channel errorChannel: {"metadata":{},"status":"Success"}
17:17:33.513 [main] DEBUG io.fabric8.kubernetes.client.impl.BaseClient -- The client and associated httpclient io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl have been closed, the usage of this or any client using the httpclient will not work after this
17:17:33.513 [OkHttp https://192.168.49.2:8443/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.ExecWebSocketListener -- Exec Web Socket: On Close with code:[1000], due to: []

Additional context

No response

rohanKanojia commented 6 months ago

@loicmathieu : Are you able to reproduce the same issue with kubectl? I'm not 100% sure if KubernetesClient does anything different when uploading to a container or initContainer (need to check).

loicmathieu commented 6 months ago

@rohanKanojia just tested it and uploading via kubectk is immediate so the issue is only with the KubernetesClient

manusa commented 6 months ago

It might have to do with the newer enforced readiness waits we added for Pods (#4637 / #4695 / #5229 / #5245).

Have you noticed if this is a fixed time. I'm assuming that it's always ~10s~ 5s plus the time needed for the effective upload.

I've faced similar issues while trying to retrieve the logs too and not being able to do so until the pod became ready (which sometime never happens in case of crash).

Could you try to set .withReadyWaitTimeout(0) before the call and see if it fixes it?

loicmathieu commented 6 months ago

@manusa using .withReadyWaitTimeout(0) fixes the issue, the upload is no immediate and I verify the file is present. It's more 15s delay than 5s

manusa commented 6 months ago

It's more 15s delay than 5s

Probably there's a retry involved

using .withReadyWaitTimeout(0) fixes the issue

We need to properly document this in the Javadocs.

shawkins commented 6 months ago

I've faced similar issues while trying to retrieve the logs too and not being able to do so until the pod became ready (which sometime never happens in case of crash).

At least in the latest versions the only condition the logic should wait on is if the Pod is stuck in Pending, otherwise it should let the operation continue.

Probably there's a retry involved

It's likely that 3 separate operations are being performed that trigger a wait. These waits are soft - that is we just log and keep going if they are exceeded.

We need to properly document this in the Javadocs.

You can also consider setting the default to 0 for release 7 and let the behavior be opt-in instead. We could also optimize the 0 wait case so that it doesn't start a watch.

manusa commented 6 months ago

We can consider switching the defaults for v7.

As a pending task, we need to add this information to the current Javadocs.

stale[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!