avito-tech / avito-android

Infrastructure of Avito android
https://avito-tech.github.io/avito-android
MIT License
415 stars 50 forks source link

Pod requests queue is empty => leak of emulator deployment #1260

Open materkey opened 3 years ago

materkey commented 3 years ago

Describe the bug Pod requests queue is empty => leak of emulator deployment

How to reproduce

  1. instrumentationUi started
  2. 8/14 pods Running with emulators (6 pods has insufficient cpu)
  3. 20 minutes later has cpu resources => creating new pods in deployment => crash IllegalStateException: Pod requests queue is empty

Expected behavior Expect no deployment leak and no failed instrumentationUi for this scenario

Environment Version: 2021.36 (fork) 2 worker nodes (openstack VMs), each has 12 cpu

Additional context Logs:

[StatsDSender@:app:instrumentationUi] time:consumerapp.testrunner.app.ui.reservation.pod.queue:1653
[RemoteDeviceProvider@:app:instrumentationUi] Found new pod: default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-t527c2021-10-27T16:52:42.650782489+03:00 
[StatsDSender@:app:instrumentationUi] time:consumerapp.testrunner.app.ui.reservation.pod.queue:1653
[RemoteDeviceProvider@:app:instrumentationUi] Found new pod: default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-v9596
kotlinx.coroutines.JobCancellationException: ScopeCoroutine is cancelling; job=ScopeCoroutine{Cancelling}@16296bdd
Caused by: java.lang.IllegalStateException: Pod requests queue is empty
    at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationState.podAcquired(KubernetesReservationState.kt:42)
    at com.avito.android.runner.devices.internal.kubernetes.StatsDKubernetesReservationMetricsSender.onPodAcquired(StatsDKubernetesReservationMetricsSender.kt:25)
    at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationClaimer$initializeDevices$2$1.invokeSuspend(KubernetesReservationClaimer.kt:101)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)2021-10-27T16:52:42.650893088+03:00 [TestRunner@:app:instrumentationUi] Test run finished with error

kotlinx.coroutines.JobCancellationException: Parent job is Cancelling; job=StandaloneCoroutine{Cancelling}@11922229
Caused by: java.lang.IllegalStateException: Pod requests queue is empty
    at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationState.podAcquired(KubernetesReservationState.kt:42)
    at com.avito.android.runner.devices.internal.kubernetes.StatsDKubernetesReservationMetricsSender.onPodAcquired(StatsDKubernetesReservationMetricsSender.kt:25)2021-10-27T16:52:42.650935967+03:00 
    at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationClaimer$initializeDevices$2$1.invokeSuspend(KubernetesReservationClaimer.kt:101)2021-10-27T16:52:42.650947013+03:00 
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)2021-10-27T16:52:42.650969371+03:00 
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)2021-10-27T16:52:42.650991556+03:00 
[RemoteDeviceProvider@:app:instrumentationUi] Pod default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-j8hk8 can't load device. Disconnect and delete.
Check device logs in artifacts: /job/app/app/build/test-runner/4bdd0cc9fa0288878524b47b3e7574a3d2cdb4d9.local-root/ui/devices/10.0.3.134.txt
[StatsDSender@:app:instrumentationUi] time:consumerapp.service.kubernetes.pods_delete.202:34
[RemoteDeviceProvider@:app:instrumentationUi] Pod default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-j8hk8 is deleted: true
[AbstractDevice@:app:instrumentationUi] Wait device with serial: 10.0.3.137:5555 succeed in 10008 at attempt=1
[AbstractDevice@:app:instrumentationUi] Wait device with serial: 10.0.3.136:5555 succeed in 10010 at attempt=1
kotlinx.coroutines.JobCancellationException: Parent job is Cancelling; job=StandaloneCoroutine{Cancelling}@11922229
Caused by: java.lang.IllegalStateException: Pod requests queue is empty
    at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationState.podAcquired(KubernetesReservationState.kt:42)
    at com.avito.android.runner.devices.internal.kubernetes.StatsDKubernetesReservationMetricsSender.onPodAcquired(StatsDKubernetesReservationMetricsSender.kt:25)
    at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationClaimer$initializeDevices$2$1.invokeSuspend(KubernetesReservationClaimer.kt:101)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
kotlinx.coroutines.JobCancellationException: Parent job is Cancelling; job=StandaloneCoroutine{Cancelling}@11922229
Caused by: java.lang.IllegalStateException: Pod requests queue is empty
    at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationState.podAcquired(KubernetesReservationState.kt:42)
    at com.avito.android.runner.devices.internal.kubernetes.StatsDKubernetesReservationMetricsSender.onPodAcquired(StatsDKubernetesReservationMetricsSender.kt:25)2021-10-27T16:52:52.650765459+03:00 [RemoteDeviceProvider@:app:instrumentationUi] Pod default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-t527c can't load device. Disconnect and delete.
Check device logs in artifacts: /job/app/app/build/test-runner/4bdd0cc9fa0288878524b47b3e7574a3d2cdb4d9.local-root/ui/devices/10.0.3.137.txt

    at com.avito.android.runner.devices.internal.kubernetes.KubernetesReservationClaimer$initializeDevices$2$1.invokeSuspend(KubernetesReservationClaimer.kt:101)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
[RemoteDeviceProvider@:app:instrumentationUi] Pod default-462db226-8a78-4421-91ab-f3b0af6152fa-6d8768786f-v9596 can't load device. Disconnect and delete.

I see this exception 4 times in one run and then:

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':app:instrumentationUi'.
> A failure occurred while executing com.avito.gradle.worker.NonSerializableWork
   > Pod requests queue is empty
dsvoronin commented 3 years ago

Please check if 2021.37 will fix this issue