birdthedeveloper / prometheus-android-exporter

Prometheus exporter for android
Apache License 2.0
26 stars 8 forks source link

Intermittent BindException: Address already in use error #1

Open siriak opened 6 months ago

siriak commented 6 months ago

I'm getting BindException: Address already in use error from time to time and then the app stops responding to my requests. I'm running a slightly modified version of the app, but it only has a few more metrics added, so I believe it shouldn't be the reason. Here are the logs of when this happens:

---------------------------- PROCESS STARTED (32008) for package com.birdthedeveloper.prometheus.android.exporter ---------------------------- 2024-03-28 20:38:01.081 6445-6794 AndroidRuntime pid-6445 E FATAL EXCEPTION: DefaultDispatcher-worker-8 Process: com.birdthedeveloper.prometheus.android.exporter, PID: 6445 java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:442) at sun.nio.ch.Net.bind(Net.java:434) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at io.ktor.network.sockets.ConnectUtilsJvmKt.bind(ConnectUtilsJvm.kt:35) at io.ktor.network.sockets.TcpSocketBuilder.bind(TcpSocketBuilder.kt:45) at io.ktor.network.sockets.TcpSocketBuilder.bind(TcpSocketBuilder.kt:29) at io.ktor.server.cio.backend.HttpServerKt$httpServer$acceptJob$1.invokeSuspend(HttpServer.kt:48) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115) at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:100) at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)

I have the following chart and you can see there are gaps in it. They occur when the app stops working and stops responding to Prometheus requests. image As you can see, sometimes it starts responding again in some time (from 1 to about 15 minutes) and sometimes it just stops forever (at least it's what it looks like).

You can find a bigger chunk of logs attached. logs.txt

I'm not a Kotlin developer, but I'm willing to do what I can to solve this issue and add more metrics to this tool in the future because I find it useful.

The issue occurs on Samsung SM-A525F very often (every 10-30 minutes), on SM-A515F less often, and on Lenovo YT-J706X rarely (1-2 times a day). I have allowed the app to run unrestricted on all devices and it has significantly improved uptime (before that it would just stop completely in about an hour), but still the issue persists.

birdthedeveloper commented 6 months ago

Hello, thanks for reporting the issue. I think this is caused by the OS restarting the application for some reason which results in the port to which the server binds being still marked as allocated when the application is restarted. Hence the exception "Address already in use". I tried to add some configuration to the prometheus server (the embedded server ktor that runs on the phone and sends the metrics) setup which could mitigate this issue. Specifically, setting anything related to shutdown timeout to zero. Unfortunately, I don't currently have time to investigate further. Changes are in the pull request mentioned bellow, you can try them out if you want to @siriak.

siriak commented 6 months ago

I'll try them out and get back to you in a few days once I collect some statistics. Thanks!

siriak commented 6 months ago

I have checked the logs and now the exception does not reproduce, the process is just ended and started again. I guess it's better now, but the charts look pretty much the same.

I have read more about how Android manages apps and I'm pretty sure now that the app is closed by the system when it's low on resources. When I keep it open in background (this might not be the correct terminology, but I refer to the state on the screenshot below), it's restarted much more rarely (but still restarted at times). Screenshot_20240413_212851_One UI Home

Is there a way to keep it alive? I have disabled all system optimizations I could find, but that hasn't resolved the issue. I have read that, to give it the highest priority, it must show an ongoing notification to the user and in code it tries to do that, but on my phone the notification can still be dismissed (which means it's not ongoing afaik).

To sum up: the bind exception issue is resolved in #2, but the real issue is that the app is killed by OS from time to time.