ACINQ / phoenixd

https://phoenix.acinq.co/server
Apache License 2.0
111 stars 14 forks source link

kotlin.IllegalStateException: Process descriptor file table is full after many connection attempts #118

Open ekzyis opened 2 weeks ago

ekzyis commented 2 weeks ago

Hello, I have been running phoenixd on a VPS for a while now and I noticed that it keeps crashing after some time with the following error (assuming I remember the error last time correctly):

Uncaught Kotlin exception: kotlin.IllegalStateException: Process descriptor file table is full
    at 0   phoenixd                            0x55e680           kfun:kotlin.Throwable#<init>(kotlin.String?){} + 32
    at 1   phoenixd                            0x55bece           kfun:kotlin.Exception#<init>(kotlin.String?){} + 30
    at 2   phoenixd                            0x55bf8e           kfun:kotlin.RuntimeException#<init>(kotlin.String?){} + 30
    at 3   phoenixd                            0x55c1ae           kfun:kotlin.IllegalStateException#<init>(kotlin.String?){} + 30
    at 4   phoenixd                            0x4b05bb           kfun:io.ktor.network.sockets.TCPServerSocketNative.$acceptCOROUTINE$0.invokeSuspend#internal + 1371
    at 5   phoenixd                            0x4b09ae           kfun:io.ktor.network.sockets.TCPServerSocketNative#accept#suspend(kotlin.coroutines.Continuation<io.ktor.network.sockets.Sock
et>){}kotlin.Any + 126
    at 6   phoenixd                            0x4b3ced           kfun:io.ktor.network.sockets.ServerSocket#accept#suspend(kotlin.coroutines.Continuation<io.ktor.network.sockets.Socket>){}kot
lin.Any-trampoline + 93
    at 7   phoenixd                            0xb524a4           kfun:io.ktor.server.cio.backend.$httpServer$lambda$3COROUTINE$0.invokeSuspend#internal + 2820
    at 8   phoenixd                            0x5dfc69           kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#invokeSuspend(kotlin.Result<kotlin.Any?>){}kotlin.Any?-trampoline
 + 25
    at 9   phoenixd                            0x560300           kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#resumeWith(kotlin.Result<kotlin.Any?>){} + 224
    at 10  phoenixd                            0x5dfcf5           kfun:kotlin.coroutines.Continuation#resumeWith(kotlin.Result<1:0>){}-trampoline + 85
    a
    at 11  phoenixd                            0x63a5ea           kfun:kotlinx.coroutines.DispatchedTask#run(){} + 634                                                                 [0/1935]
    at 12  phoenixd                            0x64be9d           kfun:kotlinx.coroutines.Runnable#run(){}-trampoline + 77
    at 13  phoenixd                            0x649149           kfun:kotlinx.coroutines.MultiWorkerDispatcher.$workerRunLoop$lambda$2COROUTINE$0.invokeSuspend#internal + 777
    at 14  phoenixd                            0x5dfc69           kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#invokeSuspend(kotlin.Result<kotlin.Any?>){}kotlin.Any?-trampoline
 + 25
    at 15  phoenixd                            0x560300           kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#resumeWith(kotlin.Result<kotlin.Any?>){} + 224
    at 16  phoenixd                            0x5dfcf5           kfun:kotlin.coroutines.Continuation#resumeWith(kotlin.Result<1:0>){}-trampoline + 85
    at 17  phoenixd                            0x63a5ea           kfun:kotlinx.coroutines.DispatchedTask#run(){} + 634
    at 18  phoenixd                            0x64be9d           kfun:kotlinx.coroutines.Runnable#run(){}-trampoline + 77
    at 19  phoenixd                            0x60f4a8           kfun:kotlinx.coroutines.EventLoopImplBase#processNextEvent(){}kotlin.Long + 168
    at 20  phoenixd                            0x64bd53           kfun:kotlinx.coroutines.EventLoop#processNextEvent(){}kotlin.Long-trampoline + 19
    at 21  phoenixd                            0x645f07           kfun:kotlinx.coroutines.BlockingCoroutine.joinBlocking#internal + 167
    at 22  phoenixd                            0x645676           kfun:kotlinx.coroutines#runBlocking(kotlin.coroutines.CoroutineContext;kotlin.coroutines.SuspendFunction1<kotlinx.coroutines.
CoroutineScope,0:0>){0§<kotlin.Any?>}0:0 + 598
    at 23  phoenixd                            0x64576e           kfun:kotlinx.coroutines#runBlocking$default(kotlin.coroutines.CoroutineContext?;kotlin.coroutines.SuspendFunction1<kotlinx.co
routines.CoroutineScope,0:0>;kotlin.Int){0§<kotlin.Any?>}0:0 + 94
    at 24  phoenixd                            0x64854d           kfun:kotlinx.coroutines.MultiWorkerDispatcher.workerRunLoop#internal + 109
    at 25  phoenixd                            0x648cc5           kfun:kotlinx.coroutines.MultiWorkerDispatcher.<init>$lambda$1$lambda$0#internal + 21
    at 26  phoenixd                            0x649526           kfun:kotlinx.coroutines.MultiWorkerDispatcher.$<init>$lambda$1$lambda$0$FUNCTION_REFERENCE$5.invoke#internal + 22
    at 27  phoenixd                            0x649579           kfun:kotlinx.coroutines.MultiWorkerDispatcher.$<init>$lambda$1$lambda$0$FUNCTION_REFERENCE$5.$<bridge-UNN>invoke(){}#internal
 + 25
    at 28  phoenixd                            0x5de1bb           kfun:kotlin.Function0#invoke(){}1:0-trampoline + 75
    at 29  phoenixd                            0x565379           WorkerLaunchpad + 57
    at 30  phoenixd                            0x5f979f           _ZN6Worker19processQueueElementEb + 927
    at 31  phoenixd                            0x5f93bc           _ZN12_GLOBAL__N_113workerRoutineEPv + 76
    at 32  libc.so.6                           0x7f55b70c69ea     0x0 + 140006119991786
    at 33  libc.so.6                           0x7f55b714a7cb     0x0 + 140006120531915
    Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [LazyStandaloneCoroutine{Cancelling}@b2abb600, DefaultDispatcher@b422fce0]
        at 0   phoenixd                            0x55e680           kfun:kotlin.Throwable#<init>(kotlin.String?){} + 32
        at 1   phoenixd                            0x55bece           kfun:kotlin.Exception#<init>(kotlin.String?){} + 30
        at 2   phoenixd                            0x55bf8e           kfun:kotlin.RuntimeException#<init>(kotlin.String?){} + 30
        at 3   phoenixd                            0x649da5           kfun:kotlinx.coroutines.internal.DiagnosticCoroutineContextException#<init>(kotlin.coroutines.CoroutineContext){} + 85
        at 4   phoenixd                            0x638ada           kfun:kotlinx.coroutines.internal#handleUncaughtCoroutineException(kotlin.coroutines.CoroutineContext;kotlin.Throwable){}
+ 314
        at 5   phoenixd                            0x60d2dc           kfun:kotlinx.coroutines#handleCoroutineException(kotlin.coroutines.CoroutineContext;kotlin.Throwable){} + 124
        at 6   phoenixd                            0x607af5           kfun:kotlinx.coroutines.StandaloneCoroutine.handleJobException#internal + 85
        at 7   phoenixd                            0x64ca63           kfun:kotlinx.coroutines.JobSupport#handleJobException(kotlin.Throwable){}kotlin.Boolean-trampoline + 19
        at 8   phoenixd                            0x611e7c           kfun:kotlinx.coroutines.JobSupport.finalizeFinishingState#internal + 316
        at 9   phoenixd                            0x615be7           kfun:kotlinx.coroutines.JobSupport.continueCompleting#internal + 119
        at 10  phoenixd                            0x6170ed           kfun:kotlinx.coroutines.JobSupport.ChildCompletion.invoke#internal + 141
        at 11  phoenixd                            0x64caf3           kfun:kotlinx.coroutines.JobNode#invoke(kotlin.Throwable?){}-trampoline + 19
        at 12  phoenixd                            0x612cb7           kfun:kotlinx.coroutines.JobSupport.notifyCompletion#internal + 519
        at 13  phoenixd                            0x612577           kfun:kotlinx.coroutines.JobSupport.completeStateFinalization#internal + 295
        at 14  phoenixd                            0x611f3f           kfun:kotlinx.coroutines.JobSupport.finalizeFinishingState#internal + 511
        at 15  phoenixd                            0x615743           kfun:kotlinx.coroutines.JobSupport.tryMakeCompletingSlowPath#internal + 659
        at 16  phoenixd                            0x615460           kfun:kotlinx.coroutines.JobSupport.tryMakeCompleting#internal + 192
        at 17  phoenixd                            0x6152a4           kfun:kotlinx.coroutines.JobSupport#makeCompletingOnce(kotlin.Any?){}kotlin.Any? + 148
        at 18  phoenixd                            0x606f7a           kfun:kotlinx.coroutines.AbstractCoroutine#resumeWith(kotlin.Result<1:0>){} + 106
        at 19  phoenixd                            0x5dfcf5           kfun:kotlin.coroutines.Continuation#resumeWith(kotlin.Result<1:0>){}-trampoline + 85
        at 20  phoenixd                            0x5603ed           kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#resumeWith(kotlin.Result<kotlin.Any?>){} + 461
        at 21  phoenixd                            0x5dfcf5           kfun:kotlin.coroutines.Continuation#resumeWith(kotlin.Result<1:0>){}-trampoline + 85
        at 22  phoenixd                            0x63a54a           kfun:kotlinx.coroutines.DispatchedTask#run(){} + 474
        ... and 22 more common stack frames skipped
Aborted (core dumped)

Before that, I see a lot of these messages:

2024-10-10 14:42:00 connecting to lightning peer...
2024-10-10 14:42:00 disconnected from lightning peer
2024-10-10 14:42:03 connecting to lightning peer...
2024-10-10 14:42:03 disconnected from lightning peer
2024-10-10 14:42:06 disconnected from lightning peer
2024-10-10 14:42:09 connecting to lightning peer...
2024-10-10 14:42:09 disconnected from lightning peer
2024-10-10 14:42:12 connecting to lightning peer...
2024-10-10 14:42:12 disconnected from lightning peer
2024-10-10 14:42:15 connecting to lightning peer...
2024-10-10 14:42:15 disconnected from lightning peer
2024-10-10 14:42:18 connecting to lightning peer...
2024-10-10 14:42:18 disconnected from lightning peer

In fact, there has been this pair of messages every 3 seconds for at least an hour. That's how far I was able to scroll up in my logs so this might have happened for a very long time.

It could be that my VPS lost internet access at some point but it definitely had internet access between 1 and 3pm so not sure why it wasn't able to connect to the LSP.

But anyway, shouldn't phoenixd retry forever? It could also use exponential backoff.

I have restarted it now and it connected. I'll report back when it goes down again.

ekzyis commented 2 weeks ago

Oh, no, actually, the error last time was Could not open /dev/urandom when trying to pay an invoice and then it crashed.

pm47 commented 1 week ago

Hello,

What kind of host is it? and does it run something else than phoenixd? What's the result of ulimit -a ?

But anyway, shouldn't phoenixd retry forever? It could also use exponential backoff.

We fixed a bug in 0.4.0 that caused the retry loop to hang. Do you reproduce with phoenixd 0.4.0 ?

error last time was Could not open /dev/urandom when trying to pay an invoice and then it crashed.

Looks like a different problem. Phoenixd must be able to access a source of randomness

ekzyis commented 1 week ago

What kind of host is it? and does it run something else than phoenixd?

It's a Nanode 1 GB from https://cloud.linode.com/. It also runs nginx and gitea.

What's the result of ulimit -a ?

$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 127233
max locked memory           (kbytes, -l) 8192
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 127233
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

We fixed a bug in 0.4.0 that caused the retry loop to hang. Do you reproduce with phoenixd 0.4.0 ?

Thanks, I will try out 0.4.0. I am currently running version 0.3.2-97025b7.

Looks like a different problem. Phoenixd must be able to access a source of randomness

Yes but wanted to mention anyway