akka / akka-management

Akka Management is a suite of tools for operating Akka Clusters.
https://doc.akka.io/docs/akka-management/
Other
254 stars 160 forks source link

SunCertPathBuilderException when using Kubernetes api discovery in OpenShift #947

Closed khujo closed 2 years ago

khujo commented 3 years ago

Versions used

Akka version: 2.6.15
Akka-management version: 1.1.1
Akka-http version: 10.2.6

Expected Behavior

We run an Akka cluster using Kubernetes API as discovery mechanism. Discovery is configured something like this.

akka.discovery {
  kubernetes-api {
    pod-namespace = "some-namespace"
    pod-label-selector = "actorSystemName=someActorSystem"
  }
}

Discovery should work this way in a normal Kubernetes distribution, e.g. AKS, as well as in OpenShift. (It was working in OpenShift in 1.0.10)

Actual Behavior

After upgrading to akka-management 1.1.1 the discovery throws an exception when running in OpenShift CodeReady Containers.

  sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at java.base/sun.security.validator.PKIXValidator.doBuild(Unknown Source)
    at java.base/sun.security.validator.PKIXValidator.engineValidate(Unknown Source)
    at java.base/sun.security.validator.Validator.validate(Unknown Source)
    at java.base/sun.security.ssl.X509TrustManagerImpl.validate(Unknown Source)
    at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(Unknown Source)
    at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(Unknown Source)
    at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(Unknown Source)
    at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(Unknown Source)
    at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(Unknown Source)
    at java.base/sun.security.ssl.SSLHandshake.consume(Unknown Source)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(Unknown Source)
    at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(Unknown Source)
    at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(Unknown Source)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(Unknown Source)
    at akka.stream.impl.io.TLSActor.runDelegatedTasks(TLSActor.scala:437)
    at akka.stream.impl.io.TLSActor.doUnwrap(TLSActor.scala:405)
    at akka.stream.impl.io.TLSActor.doInbound(TLSActor.scala:298)
    at akka.stream.impl.io.TLSActor.$anonfun$bidirectional$1(TLSActor.scala:233)
    at akka.stream.impl.Pump.pump(Transfer.scala:203)
    at akka.stream.impl.Pump.pump$(Transfer.scala:201)
    at akka.stream.impl.io.TLSActor.pump(TLSActor.scala:52)
    at akka.stream.impl.BatchingInputBuffer.enqueueInputElement(ActorProcessor.scala:97)
    at akka.stream.impl.BatchingInputBuffer$$anonfun$upstreamRunning$1.applyOrElse(ActorProcessor.scala:148)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
    at akka.stream.impl.SubReceive.apply(Transfer.scala:19)
    at akka.stream.impl.FanIn$InputBunch$$anonfun$subreceive$1.applyOrElse(FanIn.scala:244)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
    at akka.stream.impl.SubReceive.apply(Transfer.scala:19)
    at akka.stream.impl.SubReceive.apply(Transfer.scala:15)
    at scala.PartialFunction.applyOrElse(PartialFunction.scala:189)
    at scala.PartialFunction.applyOrElse$(PartialFunction.scala:188)
    at akka.stream.impl.SubReceive.applyOrElse(Transfer.scala:15)
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:244)
    at akka.actor.Actor.aroundReceive(Actor.scala:537)
    at akka.actor.Actor.aroundReceive$(Actor.scala:535)
    at akka.stream.impl.io.TLSActor.aroundReceive(TLSActor.scala:52)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
    at akka.actor.ActorCell.invoke(ActorCell.scala:548)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
    at akka.dispatch.Mailbox.run(Mailbox.scala:231)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
  Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at java.base/sun.security.provider.certpath.SunCertPathBuilder.build(Unknown Source)
    at java.base/sun.security.provider.certpath.SunCertPathBuilder.engineBuild(Unknown Source)
    at java.base/java.security.cert.CertPathBuilder.build(Unknown Source)
    ... 47 more

Relevant logs

The exception is thrown while requesting the pods from the k8s api.

Failed k8s api request ``` 2021-09-01T12:22:04.525+00:00|INFO|akka.discovery.kubernetes.KubernetesApiServiceDiscovery|Querying for pods with label selector: [actorSystemName=someActorSystem]. Namespace: [some-namespace]. Port: [None] 2021-09-01T12:22:04.614+00:00|DEBUG|akka.http.impl.engine.client.PoolId|Creating pool. 2021-09-01T12:22:04.768+00:00|DEBUG|com.zaxxer.hikari.pool.HikariPool|db - Added connection ConnectionID:1 ClientConnectionId: 96a60596-e9f9-4d7c-8e63-3f7ce58d3a28 2021-09-01T12:22:04.768+00:00|DEBUG|com.zaxxer.hikari.pool.HikariPool|db - After adding stats (total=1, active=1, idle=0, waiting=1) 2021-09-01T12:22:04.791+00:00|DEBUG|akka.http.impl.engine.client.PoolId|Dispatching request [GET /api/v1/namespaces/some-namespace/pods Empty] to pool 2021-09-01T12:22:04.800+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Unconnected)]Dispatching request [GET /api/v1/namespaces/some-namespace/pods Empty] 2021-09-01T12:22:04.818+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Unconnected)]Before event [onNewRequest] In state [Unconnected] for [53 ms] 2021-09-01T12:22:04.827+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Unconnected)]Establishing connection 2021-09-01T12:22:04.830+00:00|DEBUG|com.zaxxer.hikari.pool.HikariPool|db - Added connection ConnectionID:2 ClientConnectionId: dbfed338-38e7-4eac-9cbf-ed113af52428 2021-09-01T12:22:04.931+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Connecting)]After event [onNewRequest] State change [Unconnected] -> [Connecting] 2021-09-01T12:22:05.048+00:00|DEBUG|akka.io.TcpOutgoingConnection|Resolving 10.217.4.1 before connecting 2021-09-01T12:22:05.056+00:00|DEBUG|akka.persistence.typed.internal.EventSourcedBehaviorImpl|Replaying events: from: 1, to: 9223372036854775807 2021-09-01T12:22:05.104+00:00|DEBUG|akka.io.SimpleDnsManager|Resolution request for 10.217.4.1 from Actor[akka://someActorSystem/system/IO-TCP/selectors/$a/3#-1300459899] 2021-09-01T12:22:05.161+00:00|DEBUG|akka.io.InetAddressDnsResolver|Request for [10.217.4.1] was not yet cached 2021-09-01T12:22:05.181+00:00|DEBUG|akka.io.TcpOutgoingConnection|Attempting connection to [/10.217.4.1:443] 2021-09-01T12:22:05.190+00:00|DEBUG|akka.io.TcpOutgoingConnection|Connection established to [10.217.4.1:443] 2021-09-01T12:22:05.219+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Connecting)]Connection attempt succeeded 2021-09-01T12:22:05.220+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Connecting)]Before event [onConnectionAttemptSucceeded] In state [Connecting] for [290 ms] 2021-09-01T12:22:05.220+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (Connecting)]Slot connection was established 2021-09-01T12:22:05.220+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (PushingRequestToConnection)]After event [onConnectionAttemptSucceeded] State change [Connecting] -> [PushingRequestToConnection] 2021-09-01T12:22:05.221+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (PushingRequestToConnection)]Before event [onRequestDispatched] In state [PushingRequestToConnection] for [0 ms] 2021-09-01T12:22:05.222+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (WaitingForResponse)]After event [onRequestDispatched] State change [PushingRequestToConnection] -> [WaitingForResponse] 2021-09-01T12:22:05.334+00:00|DEBUG|akka.actor.ActorSystemImpl|Outgoing request stream error javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target 2021-09-01T12:22:05.335+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (WaitingForResponse)]Connection failed 2021-09-01T12:22:05.335+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (WaitingForResponse)]Before event [onConnectionFailed] In state [WaitingForResponse] for [113 ms] 2021-09-01T12:22:05.335+00:00|DEBUG|akka.http.impl.engine.client.PoolId|[0 (WaitingForResponse)]Ongoing request [GET /api/v1/namespaces/some-namespace/pods Empty] is failed because of [connection failure]: [PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target] 2021-09-01T12:22:05.335+00:00|DEBUG|akka.http.impl.engine.client.PoolId|Request [GET /api/v1/namespaces/some-namespace/pods Empty] has 5 retries left, retrying... ```

This is strange, because the CA certificate used for signing the certificate of the api server is mounted at /var/run/secrets/kubernetes.io/serviceaccount/ca.crt. Akka management should pick it up to verify the certificate.

To verify this, I enabled the Java ssl debug logging. This way I could see, that a certificate is added to the trust store.

CA certificate is added to trust store ``` javax.net.ssl|DEBUG|01|main|2021-09-06 06:44:39.673 GMT|X509TrustManagerImpl.java:79|adding as trusted certificates ( "certificate" : { "version" : "v3", "serial number" : "49 D0 54 71 73 AB EA 65", "signature algorithm": "SHA256withRSA", "issuer" : "CN=kube-apiserver-lb-signer, OU=openshift", "not before" : "2021-08-10 04:47:02.000 GMT", "not after" : "2031-08-08 04:47:02.000 GMT", "subject" : "CN=kube-apiserver-lb-signer, OU=openshift", "subject public key" : "RSA", "extensions" : [ { ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:true PathLen:2147483647 ] }, { ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment Key_CertSign ] }, { ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentifier [ 0000: A0 5E F7 E1 E6 4A CC C0 5B 9F 81 50 9A 03 84 47 .^...J..[..P...G 0010: A8 6A 00 59 .j.Y ] ] } ]} ) ```

However, when you compare that to the certificates the api server sends during handshake, you will notice, that this certificate is signed by a different CA.

SSL Certificate handshake ``` javax.net.ssl|DEBUG|1C|ConfigurationCompartment-akka.actor.default-dispatcher-11|2021-09-06 06:44:40.178 GMT|CertificateMessage.java:366|Consuming server Certificate handshake message ( "Certificates": [ "certificate" : { "version" : "v3", "serial number" : "2E 1D 6B E4 3E 04 D0 BD", "signature algorithm": "SHA256withRSA", "issuer" : "CN=kube-apiserver-service-network-signer, OU=openshift", "not before" : "2021-09-03 06:15:05.000 GMT", "not after" : "2021-10-03 06:15:06.000 GMT", "subject" : "CN=10.217.4.1", "subject public key" : "RSA", "extensions" : [ { ObjectId: 2.5.29.35 Criticality=false AuthorityKeyIdentifier [ KeyIdentifier [ 0000: 57 FB 22 A7 34 9D 84 A9 BB D3 CA 59 86 88 09 B9 W.".4......Y.... 0010: BF E1 D6 43 ...C ] ] }, { ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:false PathLen: undefined ] }, { ObjectId: 2.5.29.37 Criticality=false ExtendedKeyUsages [ serverAuth ] }, { ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment ] }, { ObjectId: 2.5.29.17 Criticality=false SubjectAlternativeName [ DNSName: kubernetes DNSName: kubernetes.default DNSName: kubernetes.default.svc DNSName: kubernetes.default.svc.cluster.local DNSName: openshift DNSName: openshift.default DNSName: openshift.default.svc DNSName: openshift.default.svc.cluster.local DNSName: 10.217.4.1 IPAddress: 10.217.4.1 ] }, { ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentifier [ 0000: 0E 95 95 83 1B BB D7 CE F0 EA 35 5D 06 76 3F 18 ..........5].v?. 0010: 8A 36 FC BD .6.. ] ] } ]}, "certificate" : { "version" : "v3", "serial number" : "7D 84 A7 1F 89 16 76 5D", "signature algorithm": "SHA256withRSA", "issuer" : "CN=kube-apiserver-service-network-signer, OU=openshift", "not before" : "2021-08-10 04:47:02.000 GMT", "not after" : "2031-08-08 04:47:02.000 GMT", "subject" : "CN=kube-apiserver-service-network-signer, OU=openshift", "subject public key" : "RSA", "extensions" : [ { ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:true PathLen:2147483647 ] }, { ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment Key_CertSign ] }, { ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentifier [ 0000: 57 FB 22 A7 34 9D 84 A9 BB D3 CA 59 86 88 09 B9 W.".4......Y.... 0010: BF E1 D6 43 ...C ] ] } ]} ] ) ```

As you can see, the certificate of the api server is signed by the kube-apiserver-service-network-signer certificate. But the certificate added to the trust store is kube-apiserver-lb-signer.

Then I noticed, that in case of OpenShift, the /var/run/secrets/kubernetes.io/serviceaccount/ca.crt file contains multiple certificates. These are the subjects of the certificates in the order they occur in the file.

So while the api server is signed by the third certificate, akka management only loads the first certificate from the file. This way, the certificate path cannot be validated, because kube-apiserver-service-network-signer was not added to the trust store.

In case of the AKS cluster we are running, the ca.crt file contains only one certificate. That's why it's working in k8s but not in OpenShift.

khujo commented 3 years ago

I have a working patch for this issue. I will create a PR as soon as I have my employers' agreement to sign the CLA.

In the meanwhile, this is the root cause:

The certificate is loaded by the PemManagersProvider.

https://github.com/akka/akka-management/blob/5557e2c3de229efca86451187949b8687b9059b6/management-pki/src/main/scala/akka/pki/kubernetes/PemManagersProvider.scala#L62-L65

certFactory#generateCertificate calls sun.security.provider.X509Factory#engineGenerateCertificate. This will load the first certificate from the file and return it. To load all certificates from the file, certFactory#generateCertificates should be called instead, loading all certificates from the file.