GoogleContainerTools / kpt-config-sync

Config Sync - used to sync Git, OCI and Helm charts to your clusters.
Apache License 2.0
229 stars 40 forks source link

BUG: Root reconciler failing in clusters with proper user mapping (OpenShift). #1273

Closed robertosgm closed 2 weeks ago

robertosgm commented 3 weeks ago

When running on systems with correct user mapping from RunAsUser (like OpenShift), the root reconciler crashes with this error:

$ kubectl logs -n config-management-system deploy/root-reconciler -c reconciler --previous
I0613 16:14:23.856361       1 setup.go:31] Build Version: v1.18.1-rc.1-0-g5d50947e
I0613 16:14:23.858500       1 main.go:216] Starting reconciler for: root
F0613 16:14:23.858592       1 reconciler.go:151] Error creating rest config: failed to build rest config: reading local kubeconfig: loading REST config from "/.kube/config": stat /.kube/config: no such file or directory

This was working on v1.16.1 and fails on v1.17.x, v1.18.x This is because the InClusterConfig from this new code never runs and jumps straight to trying to configure from a kubeconfig file:

func NewRestConfig(timeout time.Duration) (*rest.Config, error) {
    var cfg *rest.Config
    // Detect kubectl config file
    path, err := KubeConfigPath()
    if err != nil {
        // Build from k8s downward API
        cfg, err = NewFromInClusterConfig()
        if err != nil {
            return nil, fmt.Errorf("failed to build rest config: kubeconfig not found: reading in-cluster config: %w", err)
        }
    } else {
        // Build from local config file
        cfg, err = NewFromConfigFile(path)
        if err != nil {
            return nil, fmt.Errorf("failed to build rest config: reading local kubeconfig: %w", err)
        }
    }
    // Set timeout, if specified.
    if timeout != 0 {
        cfg.Timeout = timeout
    }
    klog.V(7).Infof("Config: %#v", *cfg)
    return cfg, nil
}

The reason is that KubeConfigPath() only fails when user mapping is wrong and user 1000 is not proper, in OpenShift the user mapping is done correctly.

A more correct code should try to do the NewFromInClusterConfig first and if that fails, then try to configure from Kubeconfig. Or retry the NewFromInClusterConfig after the NewFromConfigFile fails.

sdowell commented 3 weeks ago

It looks like this behavior changed with https://github.com/GoogleContainerTools/kpt-config-sync/pull/986.

Another alternative fix: Check whether the file exists at the returned KubeConfigPath, and only use NewFromConfigFile if the file exists. Otherwise, use NewFromInClusterConfig.