galasa-dev / projectmanagement

Project Management repo for Issues and ZenHub
7 stars 4 forks source link

Getting 500 Server error when doing Galasactl runs prepare and passing many overrides #2000

Open Tom-Slattery opened 1 week ago

Tom-Slattery commented 1 week ago

Describe the bug

When passing in many override parameters to a Galasactl runs submit command we are seeing the Ecosystem API server return 500 Server Error.

2024/09/19 08:03:14 Galasa CLI - Submit tests (Remote)
2024/09/19 08:03:14 Galasa home is '/workspace/git-workspace/.galasa'
2024/09/19 08:03:14 GetAuthenticatedAPIClient entered.
2024/09/19 08:03:14 GetBearerToken entered.
2024/09/19 08:03:14 environment variable 'GALASA_TOKEN' over-rides a value from file '/workspace/git-workspace/.galasa/galasactl.properties'
2024/09/19 08:03:14 Retrieving bearer token from file '**** BEARER TOKEN FILE ****'
2024/09/19 08:03:14 Retrieved bearer token from file '**** BEARER TOKEN FILE ****' OK
2024/09/19 08:03:14 JwtCache: Get. Returning jwt of length 792. err: <nil>
2024/09/19 08:03:14 GetBearerToken exiting. length of bearerToken: 792 err:<nil>
2024/09/19 08:03:14 GetAuthenticatedAPIClient exiting. err: <nil>
2024/09/19 08:03:14 NewRemoteLauncher(***************) entered.
2024/09/19 08:03:14 Using group name 'a5ec0d63-66b3-475d-9c26-941d654c930a' for test run submission
2024/09/19 08:03:14 GetRunsByGroup(a5ec0d63-66b3-475d-9c26-941d654c930a) entered.
2024/09/19 08:03:14 buildListOfRunsToSubmit - portfolio &{v1alpha galasa.dev/testPortfolio {adhoc} [{com.ibm.cics.tests.example.cemt com.ibm.cics.tests.example.cemt.TestBasic main  map[] } {com.ibm.cics.tests.example.cemt com.ibm.cics.tests.example.cemt.TestCemt main  map[] }]}, runOverrides map[****** OVER 100 OVERRIDES *******]
2024/09/19 08:03:14 Added test main/com.ibm.cics.tests.example.cemt/com.ibm.cics.tests.example.cemt.TestBasic to the ready queue
2024/09/19 08:03:14 Added test main/com.ibm.cics.tests.example.cemt/com.ibm.cics.tests.example.cemt.TestCemt to the ready queue
2024/09/19 08:03:14 submitRun - com.ibm.cics.tests.example.cemt/com.ibm.cics.tests.example.cemt.TestBasic, CLI
2024/09/19 08:03:14 RemoteLauncher.SubmitTestRuns : using requestor root
2024/09/19 08:03:14 Failed to submit test com.ibm.cics.tests.example.cemt/com.ibm.cics.tests.example.cemt.TestBasic - 500 Server Error
2024/09/19 08:03:14 GAL1093E: Failed to submit test (bundle: com.ibm.cics.tests.example.cemt, class: com.ibm.cics.tests.example.cemt.TestBasic). Reason is: 500 Server Error
GAL1093E: Failed to submit test (bundle: com.ibm.cics.tests.example.cemt, class: com.ibm.cics.tests.example.cemt.TestBasic). Reason is: 500 Server Error
2024/09/19 08:03:14 submitRun - com.ibm.cics.tests.example.cemt/com.ibm.cics.tests.example.cemt.TestCemt, CLI
2024/09/19 08:03:14 RemoteLauncher.SubmitTestRuns : using requestor root
2024/09/19 08:03:14 Failed to submit test com.ibm.cics.tests.example.cemt/com.ibm.cics.tests.example.cemt.TestCemt - 500 Server Error
2024/09/19 08:03:14 GAL1093E: Failed to submit test (bundle: com.ibm.cics.tests.example.cemt, class: com.ibm.cics.tests.example.cemt.TestCemt). Reason is: 500 Server Error
2024/09/19 08:03:14 GetRunsByGroup(a5ec0d63-66b3-475d-9c26-941d654c930a) entered.
GAL1093E: Failed to submit test (bundle: com.ibm.cics.tests.example.cemt, class: com.ibm.cics.tests.example.cemt.TestCemt). Reason is: 500 Server Error
Error: GAL1093E: Failed to submit test (bundle: com.ibm.cics.tests.example.cemt, class: com.ibm.cics.tests.example.cemt.TestCemt). Reason is: 500 Server Error
2024/09/19 08:03:15 GAL1093E: Failed to submit test (bundle: com.ibm.cics.tests.example.cemt, class: com.ibm.cics.tests.example.cemt.TestCemt). Reason is: 500 Server Error
2024/09/19 08:03:15 Exit code is 1

Steps to reproduce

  1. Run a galasactl automation run with over 100 overrides passed from an overrides file
  2. The server will return 500 server error.

Expected behavior

We will regularly need to test changes to our CPS properties file by passing the entire file as overrides to a test run. The API server will need to be able to cope with large numbers of overrides being passed.

techcobweb commented 1 week ago

How large is large please ?

Tom-Slattery commented 1 week ago

Seems to be for any more than 79 overrides we are getting the 500 server error

techcobweb commented 1 week ago

I have reproduced this issue. It doesn't happen for local tests, but it does happen when the overrides are sent over to the server.

eamansour commented 3 days ago

Update: etcd defaults the maximum number of operations in a transaction to 128, which we're exceeding (see https://github.com/etcd-io/etcd/blob/68e712276bb8a8f35f7f077add143b28422e8d66/server/etcdmain/help.go#L86).

Managed to capture the error that comes up in the API server when this happens:

30/09/2024 13:59:29.378 ERROR qtp203048480-23 dev.galasa.framework.api.runs.RunsServlet.processRequest - {"error_code":5021,"error_message":"GAL5021E: Error occurred when trying to submit run 'dev.galasa.inttests/dev.galasa.inttests.core.local.CoreLocalJava11Ubuntu'."}
dev.galasa.framework.api.common.InternalServletException: {"error_code":5021,"error_message":"GAL5021E: Error occurred when trying to submit run 'dev.galasa.inttests/dev.galasa.inttests.core.local.CoreLocalJava11Ubuntu'."}
    at dev.galasa.framework.api.runs.common.GroupRuns.scheduleRun(GroupRuns.java:116) ~[?:?]
    at dev.galasa.framework.api.runs.routes.GroupRunsRoute.handlePostRequest(GroupRunsRoute.java:70) ~[?:?]
    at dev.galasa.framework.api.common.BaseServlet.handleRoute(BaseServlet.java:146) ~[?:?]
    at dev.galasa.framework.api.common.BaseServlet.processRoutes(BaseServlet.java:121) ~[?:?]
    at dev.galasa.framework.api.common.BaseServlet.processRequest(BaseServlet.java:82) ~[?:?]
    at dev.galasa.framework.api.common.BaseServlet.doPost(BaseServlet.java:62) ~[?:?]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:681) ~[-1727450275929:?]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:764) ~[-1727450275929:?]
    at org.apache.felix.http.base.internal.jakartawrappers.ServletWrapper.service(ServletWrapper.java:74) ~[-1727450275940:?]
    at org.apache.felix.http.base.internal.handler.ServletHandler.handle(ServletHandler.java:127) ~[-1727450275940:?]
    at org.apache.felix.http.base.internal.dispatch.InvocationChain.doFilter(InvocationChain.java:86) ~[-1727450275940:?]
    at org.apache.felix.http.base.internal.javaxwrappers.FilterChainWrapper.doFilter(FilterChainWrapper.java:45) ~[-1727450275940:?]
    at dev.galasa.framework.api.authentication.JwtAuthFilter.doFilter(JwtAuthFilter.java:100) ~[?:?]
    at org.apache.felix.http.base.internal.jakartawrappers.FilterWrapper.doFilter(FilterWrapper.java:58) ~[-1727450275940:?]
    at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:145) ~[-1727450275940:?]
    at org.apache.felix.http.base.internal.dispatch.InvocationChain.doFilter(InvocationChain.java:81) ~[-1727450275940:?]
    at org.apache.felix.http.base.internal.dispatch.Dispatcher$1.doFilter(Dispatcher.java:153) ~[-1727450275940:?]
    at org.apache.felix.http.base.internal.whiteboard.WhiteboardManager.invokePreprocessors(WhiteboardManager.java:930) ~[-1727450275940:?]
    at org.apache.felix.http.base.internal.dispatch.Dispatcher.dispatch(Dispatcher.java:95) ~[-1727450275940:?]
    at org.apache.felix.http.base.internal.dispatch.DispatcherServlet.service(DispatcherServlet.java:49) ~[-1727450275940:?]
    at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:587) ~[-1727450275929:?]
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764) ~[-1727450275940:?]
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:529) ~[-1727450275940:?]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) ~[-1727450275940:?]
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1570) ~[-1727450275940:?]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) ~[-1727450275940:?]
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1380) ~[-1727450275940:?]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176) ~[-1727450275940:?]
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484) ~[-1727450275940:?]
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1543) ~[-1727450275940:?]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174) ~[-1727450275940:?]
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1302) ~[-1727450275940:?]
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) ~[-1727450275940:?]
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:149) ~[-1727450275940:?]
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) ~[-1727450275940:?]
    at org.eclipse.jetty.server.Server.handle(Server.java:563) ~[-1727450275940:?]
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$0(HttpChannel.java:505) ~[-1727450275940:?]
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:762) ~[-1727450275940:?]
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:497) ~[-1727450275940:?]
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282) ~[-1727450275940:?]
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314) ~[-1727450275940:?]
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) ~[-1727450275940:?]
    at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) ~[-1727450275940:?]
    at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:416) ~[-1727450275940:?]
    at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:385) ~[-1727450275940:?]
    at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:272) ~[-1727450275940:?]
    at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:140) ~[-1727450275940:?]
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411) [-1727450275940:?]
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:934) [-1727450275940:?]
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1078) [-1727450275940:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: dev.galasa.framework.spi.FrameworkException: Put Swap failed
    at dev.galasa.framework.FrameworkRuns.submitRun(FrameworkRuns.java:321) ~[?:?]
    at dev.galasa.framework.api.runs.common.GroupRuns.scheduleRun(GroupRuns.java:106) ~[?:?]
    ... 50 more
Caused by: dev.galasa.framework.spi.DynamicStatusStoreException: Put Swap failed
    at dev.galasa.cps.etcd.internal.Etcd3DynamicStatusStore.putSwap(Etcd3DynamicStatusStore.java:224) ~[?:?]
    at dev.galasa.framework.internal.dss.FrameworkDynamicStoreKeyAccess.putSwap(FrameworkDynamicStoreKeyAccess.java:125) ~[?:?]
    at dev.galasa.framework.FrameworkRuns.storeRun(FrameworkRuns.java:399) ~[?:?]
    at dev.galasa.framework.FrameworkRuns.submitRun(FrameworkRuns.java:295) ~[?:?]
    at dev.galasa.framework.api.runs.common.GroupRuns.scheduleRun(GroupRuns.java:106) ~[?:?]
    ... 50 more
Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: too many operations in txn request
    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?]
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[?:?]
    at dev.galasa.cps.etcd.internal.Etcd3DynamicStatusStore.putSwap(Etcd3DynamicStatusStore.java:221) ~[?:?]
    at dev.galasa.framework.internal.dss.FrameworkDynamicStoreKeyAccess.putSwap(FrameworkDynamicStoreKeyAccess.java:125) ~[?:?]
    at dev.galasa.framework.FrameworkRuns.storeRun(FrameworkRuns.java:399) ~[?:?]
    at dev.galasa.framework.FrameworkRuns.submitRun(FrameworkRuns.java:295) ~[?:?]
    at dev.galasa.framework.api.runs.common.GroupRuns.scheduleRun(GroupRuns.java:106) ~[?:?]
    ... 50 more
Caused by: java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: too many operations in txn request
    at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:566) ~[?:?]
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:527) ~[?:?]
    at io.etcd.jetcd.ClientConnectionManager.lambda$execute$3(ClientConnectionManager.java:352) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    ... 1 more
Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: etcdserver: too many operations in txn request
    at io.grpc.Status.asRuntimeException(Status.java:535) ~[?:?]
    at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533) ~[?:?]
    at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[?:?]
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[?:?]
    at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[?:?]
    at io.etcd.jetcd.ClientConnectionManager$AuthTokenInterceptor$1$1.onClose(ClientConnectionManager.java:395) ~[?:?]
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553) ~[?:?]
    at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68) ~[?:?]
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739) ~[?:?]
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718) ~[?:?]
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[?:?]
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    ... 1 more