Open swankjesse opened 3 years ago
I'm encountering this in a real environment; any ideas what could be causing it? My current theory is that it's a keep-alive configuration issue.
java.io.IOException: gRPC transport failure (HTTP status=200, grpc-status=null, grpc-message=null)
Some more logs from the CI build with extra okhttp logging added:
INFO: [0 ms] callStart: Request{method=POST, url=https://127.0.0.1:39961/test/GetDestinationWarehouse, headers=[te:trailers, grpc-trace-bin:, grpc-accept-encoding:gzip, grpc-encoding:gzip], tags={class com.squareup.wire.GrpcMethod=com.squareup.wire.GrpcMethod@20376e8d}}
INFO: [0 ms] proxySelectStart: https://127.0.0.1:39961/
INFO: [0 ms] proxySelectEnd: [DIRECT]
INFO: [0 ms] dnsStart: 127.0.0.1
INFO: [0 ms] dnsEnd: [/127.0.0.1]
INFO: [0 ms] connectStart: /127.0.0.1:39961 DIRECT
INFO: [1 ms] secureConnectStart
INFO: [17 ms] secureConnectEnd: Handshake{tlsVersion=TLS_1_2 cipherSuite=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 peerCertificates=[CN=misk-server, OU=Server, O=Misk, L=San Francisco, ST=CA, C=US] localCertificates=[]}
INFO: [23 ms] connectEnd: h2
INFO: [23 ms] connectionAcquired: Connection{127.0.0.1:39961, proxy=DIRECT hostAddress=/127.0.0.1:39961 cipherSuite=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 protocol=h2}
INFO: [24 ms] requestHeadersStart
INFO: [24 ms] requestHeadersEnd
INFO: [24 ms] requestBodyStart
INFO: [27 ms] requestBodyEnd: byteCount=30
INFO: [27 ms] responseHeadersStart
INFO: [27 ms] responseHeadersEnd: Response{protocol=h2, code=200, message=, url=https://127.0.0.1:39961/test/GetDestinationWarehouse}
INFO: [27 ms] responseBodyStart
INFO: [27 ms] responseFailed: okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL
INFO: [30 ms] connectionReleased
INFO: [30 ms] callFailed: okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL
INFO: [0 ms] callStart: Request{method=POST, url=http://127.0.0.1:43027/RobotLocator/SayHello, headers=[te:trailers, grpc-trace-bin:, grpc-accept-encoding:gzip, grpc-encoding:gzip], tags={class com.squareup.wire.GrpcMethod=com.squareup.wire.GrpcMethod@2421a0a5}}
INFO: [5 ms] proxySelectStart: http://127.0.0.1:43027/
INFO: [10 ms] proxySelectEnd: [DIRECT]
INFO: [11 ms] dnsStart: 127.0.0.1
INFO: [11 ms] dnsEnd: [/127.0.0.1]
INFO: [12 ms] connectStart: /127.0.0.1:43027 DIRECT
[2021-05-05 05:37:30] >> CONNECTION 505249202a20485454502f322e300d0a0d0a534d0d0a0d0a
[2021-05-05 05:37:30] >> 0x00000000 6 SETTINGS
[2021-05-05 05:37:30] >> 0x00000000 4 WINDOW_UPDATE
[2021-05-05 05:37:30] Q10035 scheduled after 0 µs: OkHttp 127.0.0.1
[2021-05-05 05:37:30] Q10035 starting : OkHttp 127.0.0.1
INFO: [14 ms] connectEnd: h2_prior_knowledge
[2021-05-05 05:37:30] Q10029 scheduled after 0 µs: OkHttp ConnectionPool
[2021-05-05 05:37:30] Q10029 starting : OkHttp ConnectionPool
[2021-05-05 05:37:30] Q10029 run again after 300 s : OkHttp ConnectionPool
[2021-05-05 05:37:30] Q10029 finished run in 489 µs: OkHttp ConnectionPool
INFO: [18 ms] connectionAcquired: Connection{127.0.0.1:43027, proxy=DIRECT hostAddress=/127.0.0.1:43027 cipherSuite=none protocol=h2_prior_knowledge}
INFO: [19 ms] requestHeadersStart
[2021-05-05 05:37:30] >> 0x00000003 122 HEADERS END_HEADERS
INFO: [20 ms] requestHeadersEnd
[2021-05-05 05:37:30] << 0x00000000 24 SETTINGS
INFO: [21 ms] requestBodyStart
[2021-05-05 05:37:30] Q10032 scheduled after 0 µs: OkHttp 127.0.0.1 applyAndAckSettings
[2021-05-05 05:37:30] << 0x00000000 4 WINDOW_UPDATE
[2021-05-05 05:37:30] Q10032 starting : OkHttp 127.0.0.1 applyAndAckSettings
[2021-05-05 05:37:30] << 0x00000000 0 SETTINGS ACK
[2021-05-05 05:37:30] Q10034 scheduled after 0 µs: OkHttp 127.0.0.1 onSettings
[2021-05-05 05:37:30] >> 0x00000000 0 SETTINGS ACK
[2021-05-05 05:37:30] Q10034 starting : OkHttp 127.0.0.1 onSettings
[2021-05-05 05:37:30] Q10034 finished run in 84 µs: OkHttp 127.0.0.1 onSettings
[2021-05-05 05:37:30] Q10032 finished run in 5 ms: OkHttp 127.0.0.1 applyAndAckSettings
[2021-05-05 05:37:30] >> 0x00000003 31 DATA
[2021-05-05 05:37:30] << 0x00000003 77 HEADERS END_HEADERS
[2021-05-05 05:37:30] << 0x00000003 16 DATA
[2021-05-05 05:37:30] << 0x00000003 12 HEADERS END_STREAM|END_HEADERS
[2021-05-05 05:37:30] << 0x00000003 4 RST_STREAM
INFO: [41 ms] requestBodyEnd: byteCount=31
INFO: [41 ms] responseHeadersStart
INFO: [42 ms] responseHeadersEnd: Response{protocol=h2_prior_knowledge, code=200, message=, url=http://127.0.0.1:43027/RobotLocator/SayHello}
INFO: [46 ms] responseBodyStart
INFO: [51 ms] responseFailed: okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL
[2021-05-05 05:37:30] Q10029 canceled : OkHttp ConnectionPool
[2021-05-05 05:37:30] >> 0x00000000 8 GOAWAY
[2021-05-05 05:37:30] Q10035 finished run in 44 ms: OkHttp 127.0.0.1
INFO: [59 ms] connectionReleased
INFO: [59 ms] callFailed: okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL
I don't see anything useful here, unfortunately, but I'll put the logs in anyway.
I looked into this error a bit, and when the tests flake, it's because the server explicitly cancelled the stream. This is happening in Http2Stream#cancelStreamIfNecessary(). Sometimes it thinks it needs to cancel the stream, but when it does, the client gets an unexpected CANCEL and fails the test. The cancel happens because source.finished
== false but source.closed == true
. When the tests pass, we get source.finished == false && source.closed == false
then source.finished == true && source.closed == true
, neither of which triggers the CANCEL.
My current theory is that there's a race between FramingSource#close()
and Http2Stream#receiveHeaders()
. If I so much as add a non-suspending breakpoint to log some state at the top of FramingSource#close()
, then it loses the race and everything is fine. In the failure case, #cancelStreamIfNecessary()
executes before receiveHeaders()
, so closed == true
but finished == false
and then we try to CANCEL.
This is super-subtle, and I'm getting tired, so I'll revisit later in the week.
Success:
GrpcConnectivityTest > serviceThrowsException() STARTED
Breakpoint reached at okhttp3.internal.http2.Http2Stream.cancelStreamIfNecessary$okhttp(Http2Stream.kt:502)
cancel=false source.finished=false source.closed=false source=okhttp3.internal.http2.Http2Stream$FramingSource@67b5e90e
Breakpoint reached at okhttp3.internal.http2.Http2Stream$FramingSource.close(Http2Stream.kt:478)
source=okhttp3.internal.http2.Http2Stream$FramingSource@67b5e90e
Breakpoint reached at okhttp3.internal.http2.Http2Stream.receiveHeaders(Http2Stream.kt:295)
inFinished=true source=okhttp3.internal.http2.Http2Stream$FramingSource@67b5e90e
Breakpoint reached at okhttp3.internal.http2.Http2Stream.cancelStreamIfNecessary$okhttp(Http2Stream.kt:502)
cancel=false source.finished=true source.closed=true source=okhttp3.internal.http2.Http2Stream$FramingSource@67b5e90e
Disconnected from the target VM, address: 'localhost:55361', transport: 'socket'
GrpcConnectivityTest > serviceThrowsException() PASSED
Failure:
GrpcConnectivityTest > serviceThrowsException() STARTED
Breakpoint reached at okhttp3.internal.http2.Http2Stream.cancelStreamIfNecessary$okhttp(Http2Stream.kt:502)
cancel=false source.finished=false source.closed=false source=okhttp3.internal.http2.Http2Stream$FramingSource@690cfa45
Breakpoint reached at okhttp3.internal.http2.Http2Stream.receiveHeaders(Http2Stream.kt:295)
inFinished=true source=okhttp3.internal.http2.Http2Stream$FramingSource@690cfa45
Breakpoint reached at okhttp3.internal.http2.Http2Stream.cancelStreamIfNecessary$okhttp(Http2Stream.kt:502)
cancel=true source.finished=false source.closed=true source=okhttp3.internal.http2.Http2Stream$FramingSource@690cfa45
Disconnected from the target VM, address: 'localhost:55713', transport: 'socket'
GrpcConnectivityTest > serviceThrowsException() FAILED
7f4249625960b7f10dc61aae4d1896c2e126f538 introduced an optimization that may be connected to this issue.
Stack trace:
close:240, Http2Stream (okhttp3.internal.http2)
cancelStreamIfNecessary$okhttp:506, Http2Stream (okhttp3.internal.http2)
close:488, Http2Stream$FramingSource (okhttp3.internal.http2)
close:34, ForwardingSource (okio)
close:309, Exchange$ResponseBodySource (okhttp3.internal.connection)
close:392, RealBufferedSource (okio)
closeQuietly:491, Util (okhttp3.internal)
close:192, ResponseBody (okhttp3)
serviceThrowsException:125, GrpcConnectivityTest (misk.grpc)
invoke0:-1, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:62, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:43, DelegatingMethodAccessorImpl (jdk.internal.reflect)
invoke:566, Method (java.lang.reflect)
invokeMethod:686, ReflectionUtils (org.junit.platform.commons.util)
proceed:60, MethodInvocation (org.junit.jupiter.engine.execution)
proceed:131, InvocationInterceptorChain$ValidatingInvocation (org.junit.jupiter.engine.execution)
intercept:149, TimeoutExtension (org.junit.jupiter.engine.extension)
interceptTestableMethod:140, TimeoutExtension (org.junit.jupiter.engine.extension)
interceptTestMethod:84, TimeoutExtension (org.junit.jupiter.engine.extension)
apply:-1, 134748208 (org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor$$Lambda$157)
lambda$ofVoidMethod$0:115, ExecutableInvoker$ReflectiveInterceptorCall (org.junit.jupiter.engine.execution)
apply:-1, 302813371 (org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall$$Lambda$158)
lambda$invoke$0:105, ExecutableInvoker (org.junit.jupiter.engine.execution)
apply:-1, 1743561709 (org.junit.jupiter.engine.execution.ExecutableInvoker$$Lambda$298)
proceed:106, InvocationInterceptorChain$InterceptedInvocation (org.junit.jupiter.engine.execution)
proceed:64, InvocationInterceptorChain (org.junit.jupiter.engine.execution)
chainAndInvoke:45, InvocationInterceptorChain (org.junit.jupiter.engine.execution)
invoke:37, InvocationInterceptorChain (org.junit.jupiter.engine.execution)
invoke:104, ExecutableInvoker (org.junit.jupiter.engine.execution)
invoke:98, ExecutableInvoker (org.junit.jupiter.engine.execution)
lambda$invokeTestMethod$6:205, TestMethodTestDescriptor (org.junit.jupiter.engine.descriptor)
execute:-1, 224454165 (org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor$$Lambda$444)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
invokeTestMethod:201, TestMethodTestDescriptor (org.junit.jupiter.engine.descriptor)
execute:137, TestMethodTestDescriptor (org.junit.jupiter.engine.descriptor)
execute:71, TestMethodTestDescriptor (org.junit.jupiter.engine.descriptor)
lambda$executeRecursively$5:135, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 1332801233 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$230)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$7:125, NodeTestTask (org.junit.platform.engine.support.hierarchical)
invoke:-1, 1146843363 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$229)
around:135, Node (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$8:123, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 270036154 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$228)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
executeRecursively:122, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:80, NodeTestTask (org.junit.platform.engine.support.hierarchical)
accept:-1, 756519367 (org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService$$Lambda$234)
forEach:1540, ArrayList (java.util)
invokeAll:38, SameThreadHierarchicalTestExecutorService (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$5:139, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 1332801233 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$230)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$7:125, NodeTestTask (org.junit.platform.engine.support.hierarchical)
invoke:-1, 1146843363 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$229)
around:135, Node (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$8:123, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 270036154 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$228)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
executeRecursively:122, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:80, NodeTestTask (org.junit.platform.engine.support.hierarchical)
accept:-1, 756519367 (org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService$$Lambda$234)
forEach:1540, ArrayList (java.util)
invokeAll:38, SameThreadHierarchicalTestExecutorService (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$5:139, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 1332801233 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$230)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$7:125, NodeTestTask (org.junit.platform.engine.support.hierarchical)
invoke:-1, 1146843363 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$229)
around:135, Node (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$8:123, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 270036154 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$228)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
executeRecursively:122, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:80, NodeTestTask (org.junit.platform.engine.support.hierarchical)
submit:32, SameThreadHierarchicalTestExecutorService (org.junit.platform.engine.support.hierarchical)
execute:57, HierarchicalTestExecutor (org.junit.platform.engine.support.hierarchical)
execute:51, HierarchicalTestEngine (org.junit.platform.engine.support.hierarchical)
execute:108, EngineExecutionOrchestrator (org.junit.platform.launcher.core)
execute:88, EngineExecutionOrchestrator (org.junit.platform.launcher.core)
lambda$execute$0:54, EngineExecutionOrchestrator (org.junit.platform.launcher.core)
accept:-1, 430960721 (org.junit.platform.launcher.core.EngineExecutionOrchestrator$$Lambda$194)
withInterceptedStreams:67, EngineExecutionOrchestrator (org.junit.platform.launcher.core)
execute:52, EngineExecutionOrchestrator (org.junit.platform.launcher.core)
execute:96, DefaultLauncher (org.junit.platform.launcher.core)
execute:75, DefaultLauncher (org.junit.platform.launcher.core)
processAllTestClasses:99, JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor (org.gradle.api.internal.tasks.testing.junitplatform)
access$000:79, JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor (org.gradle.api.internal.tasks.testing.junitplatform)
stop:75, JUnitPlatformTestClassProcessor (org.gradle.api.internal.tasks.testing.junitplatform)
stop:61, SuiteTestClassProcessor (org.gradle.api.internal.tasks.testing)
invoke0:-1, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:62, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:43, DelegatingMethodAccessorImpl (jdk.internal.reflect)
invoke:566, Method (java.lang.reflect)
dispatch:36, ReflectionDispatch (org.gradle.internal.dispatch)
dispatch:24, ReflectionDispatch (org.gradle.internal.dispatch)
dispatch:33, ContextClassLoaderDispatch (org.gradle.internal.dispatch)
invoke:94, ProxyDispatchAdapter$DispatchingInvocationHandler (org.gradle.internal.dispatch)
stop:-1, $Proxy2 (com.sun.proxy)
stop:133, TestWorker (org.gradle.api.internal.tasks.testing.worker)
invoke0:-1, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:62, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:43, DelegatingMethodAccessorImpl (jdk.internal.reflect)
invoke:566, Method (java.lang.reflect)
dispatch:36, ReflectionDispatch (org.gradle.internal.dispatch)
dispatch:24, ReflectionDispatch (org.gradle.internal.dispatch)
dispatch:182, MessageHubBackedObjectConnection$DispatchWrapper (org.gradle.internal.remote.internal.hub)
dispatch:164, MessageHubBackedObjectConnection$DispatchWrapper (org.gradle.internal.remote.internal.hub)
run:414, MessageHub$Handler (org.gradle.internal.remote.internal.hub)
onExecute:64, ExecutorPolicy$CatchAndRecordFailures (org.gradle.internal.concurrent)
run:48, ManagedExecutorImpl$1 (org.gradle.internal.concurrent)
runWorker:1128, ThreadPoolExecutor (java.util.concurrent)
run:628, ThreadPoolExecutor$Worker (java.util.concurrent)
run:56, ThreadFactoryImpl$ManagedThreadRunnable (org.gradle.internal.concurrent)
run:834, Thread (java.lang)