Describe the bug
Pod requests queue is empty => leak of emulator deployment
How to reproduce
instrumentationUi started
8/14 pods Running with emulators (6 pods has insufficient cpu)
20 minutes later has cpu resources => creating new pods in deployment => crash IllegalStateException: Pod requests queue is empty
Expected behavior
Expect no deployment leak and no failed instrumentationUi for this scenario
Environment
Version: 2021.36 (fork)
2 worker nodes (openstack VMs), each has 12 cpu
Additional context
Logs:
[StatsDSender@:app:instrumentationUi] time:consumerapp.testrunner.app.ui.reservation.pod.queue:1653
[RemoteDeviceProvider@:app:instrumentationUi] Found new pod: default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-t527c2021-10-27T16:52:42.650782489+03:00
[StatsDSender@:app:instrumentationUi] time:consumerapp.testrunner.app.ui.reservation.pod.queue:1653
[RemoteDeviceProvider@:app:instrumentationUi] Found new pod: default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-v9596
kotlinx.coroutines.JobCancellationException: ScopeCoroutine is cancelling; job=ScopeCoroutine{Cancelling}@16296bdd
Caused by: java.lang.IllegalStateException: Pod requests queue is empty
at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationState.podAcquired(KubernetesReservationState.kt:42)
at com.avito.android.runner.devices.internal.kubernetes.StatsDKubernetesReservationMetricsSender.onPodAcquired(StatsDKubernetesReservationMetricsSender.kt:25)
at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationClaimer$initializeDevices$2$1.invokeSuspend(KubernetesReservationClaimer.kt:101)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)2021-10-27T16:52:42.650893088+03:00 [TestRunner@:app:instrumentationUi] Test run finished with error
kotlinx.coroutines.JobCancellationException: Parent job is Cancelling; job=StandaloneCoroutine{Cancelling}@11922229
Caused by: java.lang.IllegalStateException: Pod requests queue is empty
at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationState.podAcquired(KubernetesReservationState.kt:42)
at com.avito.android.runner.devices.internal.kubernetes.StatsDKubernetesReservationMetricsSender.onPodAcquired(StatsDKubernetesReservationMetricsSender.kt:25)2021-10-27T16:52:42.650935967+03:00
at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationClaimer$initializeDevices$2$1.invokeSuspend(KubernetesReservationClaimer.kt:101)2021-10-27T16:52:42.650947013+03:00
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)2021-10-27T16:52:42.650969371+03:00
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)2021-10-27T16:52:42.650991556+03:00
[RemoteDeviceProvider@:app:instrumentationUi] Pod default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-j8hk8 can't load device. Disconnect and delete.
Check device logs in artifacts: /job/app/app/build/test-runner/4bdd0cc9fa0288878524b47b3e7574a3d2cdb4d9.local-root/ui/devices/10.0.3.134.txt
[StatsDSender@:app:instrumentationUi] time:consumerapp.service.kubernetes.pods_delete.202:34
[RemoteDeviceProvider@:app:instrumentationUi] Pod default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-j8hk8 is deleted: true
[AbstractDevice@:app:instrumentationUi] Wait device with serial: 10.0.3.137:5555 succeed in 10008 at attempt=1
[AbstractDevice@:app:instrumentationUi] Wait device with serial: 10.0.3.136:5555 succeed in 10010 at attempt=1
kotlinx.coroutines.JobCancellationException: Parent job is Cancelling; job=StandaloneCoroutine{Cancelling}@11922229
Caused by: java.lang.IllegalStateException: Pod requests queue is empty
at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationState.podAcquired(KubernetesReservationState.kt:42)
at com.avito.android.runner.devices.internal.kubernetes.StatsDKubernetesReservationMetricsSender.onPodAcquired(StatsDKubernetesReservationMetricsSender.kt:25)
at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationClaimer$initializeDevices$2$1.invokeSuspend(KubernetesReservationClaimer.kt:101)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
kotlinx.coroutines.JobCancellationException: Parent job is Cancelling; job=StandaloneCoroutine{Cancelling}@11922229
Caused by: java.lang.IllegalStateException: Pod requests queue is empty
at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationState.podAcquired(KubernetesReservationState.kt:42)
at com.avito.android.runner.devices.internal.kubernetes.StatsDKubernetesReservationMetricsSender.onPodAcquired(StatsDKubernetesReservationMetricsSender.kt:25)2021-10-27T16:52:52.650765459+03:00 [RemoteDeviceProvider@:app:instrumentationUi] Pod default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-t527c can't load device. Disconnect and delete.
Check device logs in artifacts: /job/app/app/build/test-runner/4bdd0cc9fa0288878524b47b3e7574a3d2cdb4d9.local-root/ui/devices/10.0.3.137.txt
at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationClaimer$initializeDevices$2$1.invokeSuspend(KubernetesReservationClaimer.kt:101)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
[RemoteDeviceProvider@:app:instrumentationUi] Pod default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-v9596 can't load device. Disconnect and delete.
I see this exception 4 times in one run and then:
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':app:instrumentationUi'.
> A failure occurred while executing com.avito.gradle.worker.NonSerializableWork
> Pod requests queue is empty
Describe the bug Pod requests queue is empty => leak of emulator deployment
How to reproduce
Expected behavior Expect no deployment leak and no failed instrumentationUi for this scenario
Environment Version: 2021.36 (fork) 2 worker nodes (openstack VMs), each has 12 cpu
Additional context Logs:
I see this exception 4 times in one run and then: