MarathonLabs / marathon

Cross-platform test runner
https://docs.marathonlabs.io
GNU General Public License v2.0
565 stars 121 forks source link

Marathon crash with `Unexpected string length` #917

Open tabletenniser opened 2 months ago

tabletenniser commented 2 months ago

Describe the bug We are running Marathon in Jenkins CI for a while, recently, we started seeing crashes with the following stack trace:

00:04:00.685  Exception in thread "DeviceMonitor @coroutine#7" com.malinskiy.adam.exception.RequestRejectedException: Unexpected string length: 
00:04:00.685    at com.malinskiy.adam.extension.SocketKt.readProtocolString(Socket.kt:106)
00:04:00.685    at com.malinskiy.adam.extension.SocketKt$readProtocolString$1.invokeSuspend(Socket.kt)
00:04:00.685    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
00:04:00.685    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
00:04:00.685    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
00:04:00.685    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
00:04:00.685    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
00:04:00.685    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
00:04:00.685    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
00:04:00.685    at java.base/java.lang.Thread.run(Thread.java:840)
00:04:00.685    Suppressed: kotlinx.coroutines.DiagnosticCoroutineContextException: [CoroutineId(7), "coroutine#7":StandaloneCoroutine{Cancelling}@1ad16e30, java.util.concurrent.ScheduledThreadPoolExecutor@8ae1b27[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 34]]

To Reproduce It roughly happens ~2% of the runs, couldn't find any consistent way to reproduce it

Expected behavior Tests pass without the failures

Logs and reports There's also warnings like the following in the log, not sure if it's relevant:

00:03:44.154  W 17:02:04.608 [AndroidDevice - execution - 127.0.0.1:5037:emulator-5554-1 @coroutine#13] <com.malinskiy.adam.request.Request> adb server rejected command 0016shell:getprop;echo x$?
00:03:44.154  Exception in thread "DeviceMonitor @coroutine#13" com.malinskiy.adam.exception.RequestRejectedException: closed
00:03:44.154    at com.malinskiy.adam.request.Request.handshake$suspendImpl(Request.kt:50)
00:03:44.154    at com.malinskiy.adam.request.Request$handshake$1.invokeSuspend(Request.kt)
00:03:44.154    at (Coroutine boundary.()
00:03:44.154    at com.malinskiy.marathon.android.adam.AdamDeviceProvider$initialize$5$2$1$job$1.invokeSuspend(AdamDeviceProvider.kt:157)
00:03:44.154    Suppressed: kotlinx.coroutines.DiagnosticCoroutineContextException: [CoroutineId(13), "coroutine#13":StandaloneCoroutine{Cancelling}@e8151c2, java.util.concurrent.ScheduledThreadPoolExecutor@8ae1b27[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 23]]
00:03:44.154  Caused by: com.malinskiy.adam.exception.RequestRejectedException: closed
00:03:44.154    at com.malinskiy.adam.request.Request.handshake$suspendImpl(Request.kt:50)
00:03:44.154    at com.malinskiy.adam.request.Request$handshake$1.invokeSuspend(Request.kt)
00:03:44.154    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
00:03:44.154    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
00:03:44.154    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
00:03:44.154    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
00:03:44.154    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
00:03:44.154    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
00:03:44.154    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
00:03:44.154    at java.base/java.lang.Thread.run(Thread.java:840)

Devices (please complete the following information):

Any idea what might be causing this or any suggested potential work around will be very helpful!

Malinskiy commented 2 months ago

I've seen this when running large test runs. Unfortunately nothing much can be done from marathon's side. To limit the blast radius i suggest running mutiple adb servers.

My suspicion is something happened with adb server code or emulator adbd that introduced this problem

Malinskiy commented 2 months ago

Did you update any of the components recently? I know API 13/14 are more prone to this sort of behavior

tabletenniser commented 2 months ago

Thanks for the response! I guess we will grep the error string and if it contains Unexpected string length, we will do some retries on the caller side in that case then...

Did you update any of the components recently? I know API 13/14 are more prone to this sort of behavior We did not, we are actually running on google-30 which corresponds to API 11 if I read this correctly.

Malinskiy commented 2 months ago

One thing that can help you understand why this happens is to run tcpdump (probably with filters) and check which side is closing the socket here for adb connection