deephaven / deephaven-core

Deephaven Community Core
Other
257 stars 80 forks source link

go/py client intermittent CI failures #6330

Closed niloc132 closed 2 weeks ago

niloc132 commented 2 weeks ago

To my knowledge, this only occurs in CI, and only when a full check run takes place, it isn't sufficient to run :go:check or :py-client:check to reproduce consistently enough to rely on results: py-client:check in CI failed 0 / 96 tries go:check in CI failed 6 / 96 tries check failed 11 / 32 tries

Failure messages in go: https://github.com/niloc132/deephaven-core/actions/runs/11668272387/job/32487558122#step:10:4368

=== RUN   Example_fetchTable
--- FAIL: Example_fetchTable (0.35s)
got:
error when opening table: rpc error: code = Unavailable desc = error reading from server: EOF
Successfully opened the old table!
want:
Successfully opened the old table!

https://github.com/niloc132/deephaven-core/actions/runs/11668272387/job/32487562349#step:10:4174

=== RUN   TestOpenTable
    test_tools.go:67: OpenTable error rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestOpenTable (0.02s)

https://github.com/niloc132/deephaven-core/actions/runs/11668272387/job/32487563983#step:10:4376

 === RUN   Example_runScript
--- FAIL: Example_runScript (0.36s)
got:
error when opening table: rpc error: code = Unavailable desc = error reading from server: EOF
want:
Got table snapshot!
It has 5 rows and 2 columns

https://github.com/niloc132/deephaven-core/actions/runs/11668272387/job/32487564477#step:10:3958

=== RUN   TestFieldSync
    test_tools.go:67: OpenTable error rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestFieldSync (0.06s)

https://github.com/niloc132/deephaven-core/actions/runs/11668272387/job/32487565020#step:10:4174

=== RUN   TestFieldSync
    test_tools.go:67: OpenTable error rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestFieldSync (0.09s)

https://github.com/deephaven/deephaven-core/pull/6318 was an attempt at mitigating this, commenting out Example_runScript, but the others still seem to happen. After that PR, we've seen both

 === RUN   TestFieldSync
    test_tools.go:67: OpenTable error rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestFieldSync (0.08s)

and

 === RUN   TestOpenTable
    test_tools.go:67: OpenTable error rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestOpenTable (0.02s)
niloc132 commented 2 weeks ago

Odd result in logs from one of the CI runs - probably not related, but logging here for future me:

2024-11-04T06:22:03.674Z | heduler-Concurrent-3 |  WARN | g.s.j.ServletServerStream | [{ServletAdapter<77>}] Exception writing message
org.eclipse.jetty.io.EofException: Closed
    at org.eclipse.jetty.server.HttpOutput.checkWritable(HttpOutput.java:753)
    at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:777)
    at io.grpc.servlet.jakarta.AsyncServletOutputStreamWriter.lambda$writeBytes$4(AsyncServletOutputStreamWriter.java:108)
    at io.grpc.servlet.jakarta.AsyncServletOutputStreamWriter.runOrBuffer(AsyncServletOutputStreamWriter.java:183)
    at io.grpc.servlet.jakarta.AsyncServletOutputStreamWriter.writeBytes(AsyncServletOutputStreamWriter.java:105)
    at io.grpc.servlet.jakarta.ServletServerStream$Sink.writeFrame(ServletServerStream.java:261)
    at io.grpc.internal.AbstractServerStream.deliverFrame(AbstractServerStream.java:118)
    at io.grpc.internal.MessageFramer.commitToSink(MessageFramer.java:354)
    at io.grpc.internal.MessageFramer.close(MessageFramer.java:330)
    at io.grpc.internal.AbstractStream.endOfMessages(AbstractStream.java:85)
    at io.grpc.internal.AbstractServerStream.close(AbstractServerStream.java:127)
    at io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:222)
    at io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:208)
    at io.grpc.PartialForwardingServerCall.close(PartialForwardingServerCall.java:48)
    at io.grpc.ForwardingServerCall.close(ForwardingServerCall.java:22)
    at io.grpc.ForwardingServerCall$SimpleForwardingServerCall.close(ForwardingServerCall.java:44)
    at io.deephaven.server.session.SessionServiceGrpcImpl$InterceptedCall.close(SessionServiceGrpcImpl.java:297)
    at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:395)
    at io.deephaven.extensions.barrage.util.GrpcUtil.lambda$safelyComplete$1(GrpcUtil.java:62)
    at io.deephaven.extensions.barrage.util.GrpcUtil.safelyExecuteLocked(GrpcUtil.java:33)
    at io.grpc.internal.AbstractServerStream.close(AbstractServerStream.java:133)
    at io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:222)
    at io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:208)
    at io.grpc.PartialForwardingServerCall.close(PartialForwardingServerCall.java:48)
    at io.grpc.ForwardingServerCall.close(ForwardingServerCall.java:22)
    at io.grpc.ForwardingServerCall$SimpleForwardingServerCall.close(ForwardingServerCall.java:44)
    at io.deephaven.server.session.SessionServiceGrpcImpl$InterceptedCall.close(SessionServiceGrpcImpl.java:297)
    at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:395)
    at io.deephaven.extensions.barrage.util.GrpcUtil.lambda$safelyComplete$1(GrpcUtil.java:62)
    at io.deephaven.extensions.barrage.util.GrpcUtil.safelyExecuteLocked(GrpcUtil.java:33)
    at io.deephaven.extensions.barrage.util.GrpcUtil.safelyComplete(GrpcUtil.java:60)
    at io.deephaven.server.table.ops.TableServiceGrpcImpl.lambda$oneShotOperationWrapper$9(TableServiceGrpcImpl.java:677)
    at io.deephaven.server.session.SessionState$ExportObject.doExport(SessionState.java:995)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:100)
    at java.base/java.lang.Thread.run(Thread.java:1583)
...
2024-11-04T06:22:03.677Z | heduler-Concurrent-3 | ERROR | i.g.i.SerializingExecutor | Exception while executing runnable io.grpc.servlet.jakarta.AsyncServletOutputStreamWriter$$Lambda/0x00007f0b6c57e180@19a8e894
java.lang.IllegalStateException: AsyncContext completed and/or Request lifecycle recycled
    at org.eclipse.jetty.server.AsyncContextState.state(AsyncContextState.java:47)
    at org.eclipse.jetty.server.AsyncContextState.complete(AsyncContextState.java:67)
    at io.grpc.servlet.jakarta.AsyncServletOutputStreamWriter.lambda$new$1(AsyncServletOutputStreamWriter.java:97)
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
    at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
    at io.grpc.internal.SerializingExecutor.schedule(SerializingExecutor.java:102)
    at io.grpc.internal.SerializingExecutor.execute(SerializingExecutor.java:95)
    at io.grpc.servlet.jakarta.ServletServerStream$ServletTransportState.runOnTransportThread(ServletServerStream.java:144)
    at io.grpc.servlet.jakarta.AsyncServletOutputStreamWriter.lambda$new$2(AsyncServletOutputStreamWriter.java:94)
    at io.grpc.servlet.jakarta.AsyncServletOutputStreamWriter.runOrBuffer(AsyncServletOutputStreamWriter.java:183)
    at io.grpc.servlet.jakarta.AsyncServletOutputStreamWriter.complete(AsyncServletOutputStreamWriter.java:127)
    at io.grpc.servlet.jakarta.ServletServerStream$Sink.writeTrailers(ServletServerStream.java:293)
    at io.grpc.internal.AbstractServerStream.close(AbstractServerStream.java:133)
    at io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:222)
    at io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:208)
    at io.grpc.PartialForwardingServerCall.close(PartialForwardingServerCall.java:48)
    at io.grpc.ForwardingServerCall.close(ForwardingServerCall.java:22)
    at io.grpc.ForwardingServerCall$SimpleForwardingServerCall.close(ForwardingServerCall.java:44)
    at io.deephaven.server.session.SessionServiceGrpcImpl$InterceptedCall.close(SessionServiceGrpcImpl.java:297)
    at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:395)
    at io.deephaven.extensions.barrage.util.GrpcUtil.lambda$safelyComplete$1(GrpcUtil.java:62)
    at io.deephaven.extensions.barrage.util.GrpcUtil.safelyExecuteLocked(GrpcUtil.java:33)
    at io.deephaven.extensions.barrage.util.GrpcUtil.safelyComplete(GrpcUtil.java:60)
    at io.deephaven.server.table.ops.TableServiceGrpcImpl.lambda$oneShotOperationWrapper$9(TableServiceGrpcImpl.java:677)
    at io.deephaven.server.session.SessionState$ExportObject.doExport(SessionState.java:995)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:100)
    at java.base/java.lang.Thread.run(Thread.java:1583)

These occurred in at least a few server logs after failure (with no data about non-failing tests), and might imply that something is getting lost in jetty's side. That said, this kind of error seems to indicate that the connection has already ended, so would not have caused a client EOF error...