Closed rolznz closed 5 months ago
why 8? and shouldn't it be smart enough to know where it runs?
Hardcoding the number of threads is not the best option, in my opinion. By default, Tokio uses the number of available CPU cores to determine this value. If we hardcode a specific number, then we underutilize systems with a larger number of cores, and we send smaller systems to overdrive (as you have witnessed 😉); neither is desirable.
Anyway, the number of worker threads in Tokio should be possible to set using the TOKIO_WORKER_THREADS
environment variable. That way, we can avoid hardcoding it entirely.
But the bigger issue is that it feels like we only cure the symptoms, not the root cause of app freezes. However, I do not have any better solution ATM.
@rdmitr good points. Can we set this env variable for just building the lib for the RPI? could you try that?
I am hesitant here, shouldn't Tokio be smart enough?
@bumi
I am hesitant here, shouldn't Tokio be smart enough?
Well, it's sort of smart: it sets the number of worker threads to the number of CPU cores, which is what most similar projects do by default. It's a good enough strategy for most use cases.
However, it seems like there's a deadlock (or a race condition, or both) somewhere in the ldk-node guts. While we are unable to pinpoint the root cause here, Roland has found that increasing the number of worker threads helps work around the issue (or maybe just masks it — we don't know for sure). It's quite a hack of course, but I don't have any better ideas at the moment 🤔
@rolznz
Can we set this env variable for just building the lib for the RPI? could you try that?
It's a runtime env variable, so it should be enough to set it prior to running the app. I would suggest trying lower values, like 2, first
@rdmitr thanks! I'll try. In that case, I will close this PR for now.
4 threads did not work well:
2024-06-02 06:02:47 ERROR [ldk_node:836] Stopping event handling timed out: deadline has elapsed
2024-06-02 06:02:57 INFO [ldk_node:856] Shutdown complete.
2024-06-02 06:04:28 ERROR [ldk_node:748] Failed to send 'events handling stopped' signal. This should never happen: channel closed
8 threads works. 4 threads would be the default on my RPI because it uses 4 cores. UPDATE: no, it seems random and shutdown still sometimes times out.
@rdmitr as far as I can tell, setting Environment="TOKIO_WORKER_THREADS=8"
does nothing. I undid the CPU limits and do not see the 300-400% CPU usage as I did when using the library built from this branch. (But I did see about 130% CPU)
the env variable should work: https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.worker_threads
but I actually doubt that this is a good path that we're on here. For all the things I did it was never a good idea to throw config options and code at things. And somehow I think if we need to set CPU limits and worker threads for an app like this then we have a problem in the app.
Yeah, but I think it's good to have a workaround to run the RPI without overheating until LDK-node has a proper fix (I don't know how long that will take and I think it would be too difficult for us to do). At least my RPI is running without overheating now with the limits set in the service file.
I will close this again.
This fixes deadlocking issues on the RPI, but also causes high CPU usage (300-400%) which leads to my RPI overheating.
Update: the RPI CPU can be limited with the following change:
sudo nano /etc/systemd/system/albyhub.service
and update the[Service]
section