OpenUnison / openunison-k8s-login-oidc

Kubernetes login portal for both kubectl and the dashboard using OpenID Connect. Use groups from your assertion in RBAC policies to control access to your cluster. Supports impersonation and OpenID Connect integration with your API server.
https://www.tremolosecurity.com/kubernetes/
Apache License 2.0
12 stars 5 forks source link

OpenUnison resource reports Failed state after upgrade 1.0.21, but otherwise works #37

Closed dkulchinsky closed 3 years ago

dkulchinsky commented 3 years ago

Hey @mlbiam,

Just upgraded to the latest OpenUnison 1.0.21 that was released yesterday in a test cluster, upgrade seem to go well, the Pods redeployed and everything seems to work as expected.

However, noticed the following error in the operator:

[openunison-operator-dbd778f5d-mn678] Error on watch - {<JSON BLOB>}
[openunison-operator-dbd778f5d-mn678] javax.script.ScriptException: TypeError: Cannot read property "spec" from undefined in <eval> at line number 417
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.api.scripting.NashornScriptEngine.throwAsScriptException(NashornScriptEngine.java:470)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.api.scripting.NashornScriptEngine.invokeImpl(NashornScriptEngine.java:392)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.api.scripting.NashornScriptEngine.invokeFunction(NashornScriptEngine.java:190)
[openunison-operator-dbd778f5d-mn678]   at com.tremolosecurity.kubernetes.artifacts.util.K8sWatcher.processEvent(K8sWatcher.java:318)
[openunison-operator-dbd778f5d-mn678]   at com.tremolosecurity.kubernetes.artifacts.util.K8sWatcher.watchUri(K8sWatcher.java:152)
[openunison-operator-dbd778f5d-mn678]   at com.tremolosecurity.kubernetes.artifacts.run.RunWatch.run(RunWatch.java:38)
[openunison-operator-dbd778f5d-mn678]   at java.lang.Thread.run(Thread.java:748)
[openunison-operator-dbd778f5d-mn678] Caused by: <eval>:417 TypeError: Cannot read property "spec" from undefined
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.internal.runtime.ECMAErrors.error(ECMAErrors.java:57)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.internal.runtime.ECMAErrors.typeError(ECMAErrors.java:213)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.internal.runtime.ECMAErrors.typeError(ECMAErrors.java:185)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.internal.runtime.ECMAErrors.typeError(ECMAErrors.java:172)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.internal.runtime.Undefined.get(Undefined.java:157)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.internal.scripts.Script$Recompilation$53$47267$\^eval\_.manageCertMgrJob(<eval>:417)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.internal.scripts.Script$Recompilation$38$52A$\^eval\_.on_watch(<eval>:36)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.internal.runtime.ScriptFunctionData.invoke(ScriptFunctionData.java:639)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.internal.runtime.ScriptFunction.invoke(ScriptFunction.java:494)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:393)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.api.scripting.ScriptObjectMirror.callMember(ScriptObjectMirror.java:199)
[openunison-operator-dbd778f5d-mn678]   at jdk.nashorn.api.scripting.NashornScriptEngine.invokeImpl(NashornScriptEngine.java:386)
[openunison-operator-dbd778f5d-mn678]   ... 5 more

also the resource status reports as failed:

❯ k get openunisons.openunison.tremolo.io orchestra -ojsonpath='{.status}'
{"conditions":{"lastTransitionTime":"2021-03-19 04:16:18GMT","reason":"error","status":"True","type":"Failed"},"digest":"A5SZJLzkLvMSV7YGaD38xNdV5K5s/rhTnWhAaBouVJ8="}

Orchestra itself doesn't have any errors in the logs and we're able to connect to the cluster and everything seems to work, I couldn't decipher any additional information from the logs.

The cluster is a GKE v1.18.16-gke.300

mlbiam commented 3 years ago

odd, we didn't make any updates to the operator yesterday. looks like the operator is trying to inspect the cronjob for updating the certificates. What does your CronJob check-certs-orchestra in the openunison namespace look like?

dkulchinsky commented 3 years ago

Thanks @mlbiam, I suspected that somehow this is related to some changes in the operator, we did upgrade it to latest around mid January but I saw there were some changes pushed, so did the following:

  1. Forceed the operator to pull the latest from upstream.
  2. Uninstalled orchestra and deleted the check-certs-orchestra cronjob.
  3. Reinstalled orchestra

No longer see the errors in the operator and resource status reports Completed:

❯ k get openunisons.openunison.tremolo.io orchestra -ojsonpath='{.status}'
{"conditions":{"lastTransitionTime":"2021-03-19 05:08:02GMT","status":"True","type":"Completed"},"digest":"aV5c6kw56K8oeQ1OxIZbw7cmTDQIkEyJcbJKn9xxt+M="}
dkulchinsky commented 3 years ago

I know we already discussed this in the past, but it would be great if in parallel to the latest tag there would also be a versioned tag available so it would be easier for us to track changes upstream of the different components 🙏🏼

mlbiam commented 3 years ago

I understand your desire for the management of upstream tags, but we don't support old versions of the operator due to making sure we stay patched. Customers that want specific, granular tag control can import the container into their own registry and run off of that imported image. What we can do is add some logging to give a detailed version so you can identify a specific version there too.