InftyAI / llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Apache License 2.0
31 stars 10 forks source link

fix: helm-install cmd #210

Closed googs1025 closed 2 days ago

googs1025 commented 3 days ago

What this PR does / why we need it

root@VM-0-2-ubuntu:/home/ubuntu/llmaz# kubectl  get pods -A | grep llmaz
default                  llmaz-controller-manager-6b86d8df79-jqr4x        1/2     CrashLoopBackOff   1 (11s ago)   13s

root@VM-0-2-ubuntu:/home/ubuntu/llmaz# kubectl logs -f llmaz-controller-manager-6b86d8df79-jqr4x
2024-11-26T11:58:03Z    INFO    setup   starting manager
2024-11-26T11:58:03Z    INFO    setup   waiting for the cert generation to complete
2024-11-26T11:58:03Z    INFO    controller-runtime.metrics  Starting metrics server
2024-11-26T11:58:03Z    INFO    controller-runtime.metrics  Serving metrics server  {"bindAddress": "127.0.0.1:8080", "secure": false}
2024-11-26T11:58:03Z    INFO    starting server {"name": "health probe", "addr": "[::]:8081"}
2024-11-26T11:58:03Z    INFO    cert-rotation   starting cert rotator controller
2024-11-26T11:58:03Z    INFO    Starting EventSource    {"controller": "cert-rotator", "source": "kind source: *v1.Secret"}
2024-11-26T11:58:03Z    INFO    Starting EventSource    {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2024-11-26T11:58:03Z    INFO    Starting EventSource    {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2024-11-26T11:58:03Z    INFO    Starting Controller {"controller": "cert-rotator"}
I1126 11:58:03.335457       1 leaderelection.go:254] attempting to acquire leader lease default/fbb36db9.llmaz.io...
2024-11-26T11:58:03Z    ERROR   cert-rotation   could not refresh cert on startup   {"error": "acquiring secret to update certificates: Secret \"llmaz-webhook-server-cert\" not found", "errorVerbose": "Secret \"llmaz-webhook-server-cert\" not found\nacquiring secret to update certificates\ngithub.com/open-policy-agent/cert-controller/pkg/rotator.(*CertRotator).refreshCertIfNeeded.func1\n\t/go/pkg/mod/github.com/open-policy-agent/cert-controller@v0.11.0/pkg/rotator/rotator.go:317\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection\n\t/go/pkg/mod/k8s.io/apimachinery@v0.31.1/pkg/util/wait/wait.go:145\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/apimachinery@v0.31.1/pkg/util/wait/backoff.go:461\ngithub.com/open-policy-agent/cert-controller/pkg/rotator.(*CertRotator).refreshCertIfNeeded\n\t/go/pkg/mod/github.com/open-policy-agent/cert-controller@v0.11.0/pkg/rotator/rotator.go:350\ngithub.com/open-policy-agent/cert-controller/pkg/rotator.(*CertRotator).Start\n\t/go/pkg/mod/github.com/open-policy-agent/cert-controller@v0.11.0/pkg/rotator/rotator.go:278\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/manager/runnable_group.go:226\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
github.com/open-policy-agent/cert-controller/pkg/rotator.(*CertRotator).Start
    /go/pkg/mod/github.com/open-policy-agent/cert-controller@v0.11.0/pkg/rotator/rotator.go:279
sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/manager/runnable_group.go:226
2024-11-26T11:58:03Z    INFO    cert-rotation   stopping cert rotator controller
2024-11-26T11:58:03Z    INFO    Stopping and waiting for non leader election runnables
2024-11-26T11:58:03Z    ERROR   controller-runtime.source.EventHandler  failed to get informer from cache   {"error": "Timeout: failed waiting for *unstructured.Unstructured Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/source/kind.go:76
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.31.1/pkg/util/wait/loop.go:53
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
    /go/pkg/mod/k8s.io/apimachinery@v0.31.1/pkg/util/wait/loop.go:54
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
    /go/pkg/mod/k8s.io/apimachinery@v0.31.1/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/source/kind.go:64
2024-11-26T11:58:03Z    ERROR   controller-runtime.source.EventHandler  failed to get informer from cache   {"error": "Timeout: failed waiting for *unstructured.Unstructured Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/source/kind.go:76
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.31.1/pkg/util/wait/loop.go:53
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
    /go/pkg/mod/k8s.io/apimachinery@v0.31.1/pkg/util/wait/loop.go:54
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
    /go/pkg/mod/k8s.io/apimachinery@v0.31.1/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/source/kind.go:64
2024-11-26T11:58:03Z    ERROR   Could not wait for Cache to sync    {"controller": "cert-rotator", "error": "failed to wait for cert-rotator caches to sync: failed to get informer from cache: Timeout: failed waiting for *unstructured.Unstructured Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:200
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:231
sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/manager/runnable_group.go:226
2024-11-26T11:58:03Z    INFO    Stopping and waiting for leader election runnables
2024-11-26T11:58:03Z    INFO    Stopping and waiting for caches
2024-11-26T11:58:03Z    INFO    Stopping and waiting for webhooks
2024-11-26T11:58:03Z    ERROR   error received after stop sequence was engaged  {"error": "failed to wait for cert-rotator caches to sync: failed to get informer from cache: Timeout: failed waiting for *unstructured.Unstructured Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/manager/internal.go:512
2024-11-26T11:58:03Z    INFO    Stopping and waiting for HTTP servers
2024-11-26T11:58:03Z    INFO    shutting down server    {"name": "health probe", "addr": "[::]:8081"}
2024-11-26T11:58:03Z    INFO    controller-runtime.metrics  Shutting down metrics server with timeout of 1 minute
2024-11-26T11:58:03Z    INFO    Wait completed, proceeding to shutdown the manager
2024-11-26T11:58:03Z    ERROR   setup   problem running manager {"error": "acquiring secret to update certificates: Secret \"llmaz-webhook-server-cert\" not found", "errorVerbose": "Secret \"llmaz-webhook-server-cert\" not found\nacquiring secret to update certificates\ngithub.com/open-policy-agent/cert-controller/pkg/rotator.(*CertRotator).refreshCertIfNeeded.func1\n\t/go/pkg/mod/github.com/open-policy-agent/cert-controller@v0.11.0/pkg/rotator/rotator.go:317\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection\n\t/go/pkg/mod/k8s.io/apimachinery@v0.31.1/pkg/util/wait/wait.go:145\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/apimachinery@v0.31.1/pkg/util/wait/backoff.go:461\ngithub.com/open-policy-agent/cert-controller/pkg/rotator.(*CertRotator).refreshCertIfNeeded\n\t/go/pkg/mod/github.com/open-policy-agent/cert-controller@v0.11.0/pkg/rotator/rotator.go:350\ngithub.com/open-policy-agent/cert-controller/pkg/rotator.(*CertRotator).Start\n\t/go/pkg/mod/github.com/open-policy-agent/cert-controller@v0.11.0/pkg/rotator/rotator.go:278\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/manager/runnable_group.go:226\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
main.main
    /workspace/cmd/main.go:128
runtime.main
    /usr/local/go/src/runtime/proc.go:271

Which issue(s) this PR fixes

Fixes #

Special notes for your reviewer

Does this PR introduce a user-facing change?

googs1025 commented 3 days ago

/kind bug

kerthcet commented 2 days ago

Thanks @googs1025 for the PR, but I pitied to say this is intended because we should support to install llmaz at different namesapces, if we hack it in the makefile, it's unattainable. See doc here: https://github.com/InftyAI/llmaz/blob/main/docs/installation.md#install-in-a-different-namespace

free to close this if you agree. Thanks.

googs1025 commented 2 days ago

Got it, I thought it was missed.

googs1025 commented 2 days ago

/close