When running several galasactl commands in succession against prod1, some commands encountered an internal server error. Internal server errors have also been seen in CLI builds.
Looking at the API server logs, this is happening when the auth filter validates the JWT provided in a request's auth header. When the JWT's signature is being validated, the auth filter sends a request to Dex to find the public JSON Web Key (JWK) used to sign the JWT.
The response for this request from Dex (or the nginx Ingress) is returning a GOAWAY signal:
17/04/2024 12:18:58.991 ERROR qtp474933596-26 dev.galasa.framework.api.authentication.JwtAuthFilter.doFilter - {"error_code":5000,"error_message":"GAL5000E: Error occured when trying to access the endpoint. Report the problem to your Galasa Ecosystem owner."}
java.io.IOException: /10.38.128.13:47946: GOAWAY received
at jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:565) ~[?:?]
at jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:119) ~[?:?]
at dev.galasa.framework.api.authentication.internal.OidcProvider.getJsonWebKeysFromIssuer(OidcProvider.java:203) ~[?:?]
at dev.galasa.framework.api.authentication.internal.OidcProvider.getJsonWebKeyFromIssuerByKeyId(OidcProvider.java:217) ~[?:?]
at dev.galasa.framework.api.authentication.internal.OidcProvider.getRSAPublicKeyFromIssuer(OidcProvider.java:266) ~[?:?]
at dev.galasa.framework.api.authentication.internal.OidcProvider.isJwtValid(OidcProvider.java:238) ~[?:?]
at dev.galasa.framework.api.authentication.JwtAuthFilter.doFilter(JwtAuthFilter.java:86) ~[?:?]
at org.apache.felix.http.base.internal.jakartawrappers.FilterWrapper.doFilter(FilterWrapper.java:58) ~[-1713345991126:?]
...
Caused by: java.io.IOException: /10.38.128.13:47946: GOAWAY received
at jdk.internal.net.http.Http2Connection.handleGoAway(Http2Connection.java:999) ~[?:?]
at jdk.internal.net.http.Http2Connection.handleConnectionFrame(Http2Connection.java:867) ~[?:?]
at jdk.internal.net.http.Http2Connection.processFrame(Http2Connection.java:738) ~[?:?]
at jdk.internal.net.http.frame.FramesDecoder.decode(FramesDecoder.java:155) ~[?:?]
at jdk.internal.net.http.Http2Connection$FramesController.processReceivedData(Http2Connection.java:232) ~[?:?]
at jdk.internal.net.http.Http2Connection.asyncReceive(Http2Connection.java:663) ~[?:?]
at jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.processQueue(Http2Connection.java:1292) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(SequentialScheduler.java:175) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:147) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224) ~[?:?]
at jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.runOrSchedule(Http2Connection.java:1310) ~[?:?]
at jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.onNext(Http2Connection.java:1336) ~[?:?]
at jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.onNext(Http2Connection.java:1270) ~[?:?]
at jdk.internal.net.http.common.SSLTube$DelegateWrapper.onNext(SSLTube.java:202) ~[?:?]
at jdk.internal.net.http.common.SSLTube$SSLSubscriberWrapper.onNext(SSLTube.java:484) ~[?:?]
at jdk.internal.net.http.common.SSLTube$SSLSubscriberWrapper.onNext(SSLTube.java:287) ~[?:?]
at jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run1(SubscriberWrapper.java:318) ~[?:?]
at jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run(SubscriberWrapper.java:261) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(SequentialScheduler.java:175) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:147) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224) ~[?:?]
at jdk.internal.net.http.common.SubscriberWrapper.outgoing(SubscriberWrapper.java:234) ~[?:?]
at jdk.internal.net.http.common.SubscriberWrapper.outgoing(SubscriberWrapper.java:200) ~[?:?]
at jdk.internal.net.http.common.SSLFlowDelegate$Reader.processData(SSLFlowDelegate.java:403) ~[?:?]
at jdk.internal.net.http.common.SSLFlowDelegate$Reader$ReaderDownstreamPusher.run(SSLFlowDelegate.java:264) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(SequentialScheduler.java:175) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:147) ~[?:?]
at jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
... 1 more
Steps to reproduce
Perform several galasactl properties get, galasactl runs get, or galasactl runs download commands
You should occasionally encounter a 500 internal server error
Expected behavior
The internal server error should not be thrown and the API server should be more stable.
Analysis so far...
The API server is getting a REST call of some sort.
It then sends a request to DEX to get decryption keys for the JWT
Dex then replies "GO AWAY!" because we've used that call too many times in a short space of time.
We need the keys from DEX to be up-to-date, as one of them can decode the JWT
We work through the list of keys to find one which works.
Proposed Solution
Add an in-memory cache of these public keys.
The cache refreshes every seconds.
When the list of keys is required, the cache is consulted.
If the cache is too old, it goes to dex, refreshes the cache, and returns the new keys
else it returns and uses the cached keys.
We need to store a timestamp and compare it to 'now' to see if the cache is out of date.
That should make it faster too, and take most of the burden off the dex service.
Tasks
[x] Add the cache + test
[x] Find out the time between dex publishing the new public key in the list, and the delay before it starts signing jwts with that key. Set the default to be lower/more frequent than that.
Describe the bug
When running several galasactl commands in succession against prod1, some commands encountered an internal server error. Internal server errors have also been seen in CLI builds.
Looking at the API server logs, this is happening when the auth filter validates the JWT provided in a request's auth header. When the JWT's signature is being validated, the auth filter sends a request to Dex to find the public JSON Web Key (JWK) used to sign the JWT.
The response for this request from Dex (or the nginx Ingress) is returning a GOAWAY signal:
Steps to reproduce
galasactl properties get
,galasactl runs get
, orgalasactl runs download
commandsExpected behavior
The internal server error should not be thrown and the API server should be more stable.
Analysis so far...
Proposed Solution
Tasks