getAlby / ldk-node

A ready-to-go node implementation built using LDK.
Other
3 stars 0 forks source link

fix: set tokio worker threads to 8 #27

Closed rolznz closed 5 months ago

rolznz commented 5 months ago

This fixes deadlocking issues on the RPI, but also causes high CPU usage (300-400%) which leads to my RPI overheating.

Update: the RPI CPU can be limited with the following change:

sudo nano /etc/systemd/system/albyhub.service and update the [Service] section

[Service]
Type=simple
Restart=always
RestartSec=1
User=root
ExecStart=/opt/albyhub/app/nostr-wallet-connect
#https://unix.stackexchange.com/a/495013
CPUWeight=20
CPUQuota=90%
IOWeight=20
MemorySwapMax=0
bumi commented 5 months ago

why 8? and shouldn't it be smart enough to know where it runs?

rdmitr commented 5 months ago

Hardcoding the number of threads is not the best option, in my opinion. By default, Tokio uses the number of available CPU cores to determine this value. If we hardcode a specific number, then we underutilize systems with a larger number of cores, and we send smaller systems to overdrive (as you have witnessed 😉); neither is desirable.

Anyway, the number of worker threads in Tokio should be possible to set using the TOKIO_WORKER_THREADS environment variable. That way, we can avoid hardcoding it entirely.

But the bigger issue is that it feels like we only cure the symptoms, not the root cause of app freezes. However, I do not have any better solution ATM.

rolznz commented 5 months ago

@rdmitr good points. Can we set this env variable for just building the lib for the RPI? could you try that?

bumi commented 5 months ago

I am hesitant here, shouldn't Tokio be smart enough?

rdmitr commented 5 months ago

@bumi

I am hesitant here, shouldn't Tokio be smart enough?

Well, it's sort of smart: it sets the number of worker threads to the number of CPU cores, which is what most similar projects do by default. It's a good enough strategy for most use cases.

However, it seems like there's a deadlock (or a race condition, or both) somewhere in the ldk-node guts. While we are unable to pinpoint the root cause here, Roland has found that increasing the number of worker threads helps work around the issue (or maybe just masks it — we don't know for sure). It's quite a hack of course, but I don't have any better ideas at the moment 🤔

rdmitr commented 5 months ago

@rolznz

Can we set this env variable for just building the lib for the RPI? could you try that?

It's a runtime env variable, so it should be enough to set it prior to running the app. I would suggest trying lower values, like 2, first

rolznz commented 5 months ago

@rdmitr thanks! I'll try. In that case, I will close this PR for now.

rolznz commented 5 months ago

4 threads did not work well:

2024-06-02 06:02:47 ERROR [ldk_node:836] Stopping event handling timed out: deadline has elapsed
2024-06-02 06:02:57 INFO  [ldk_node:856] Shutdown complete.
2024-06-02 06:04:28 ERROR [ldk_node:748] Failed to send 'events handling stopped' signal. This should never happen: channel closed
rolznz commented 5 months ago

8 threads works. 4 threads would be the default on my RPI because it uses 4 cores. UPDATE: no, it seems random and shutdown still sometimes times out.

rolznz commented 5 months ago

@rdmitr as far as I can tell, setting Environment="TOKIO_WORKER_THREADS=8" does nothing. I undid the CPU limits and do not see the 300-400% CPU usage as I did when using the library built from this branch. (But I did see about 130% CPU)

bumi commented 5 months ago

the env variable should work: https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.worker_threads

but I actually doubt that this is a good path that we're on here. For all the things I did it was never a good idea to throw config options and code at things. And somehow I think if we need to set CPU limits and worker threads for an app like this then we have a problem in the app.

rolznz commented 5 months ago

Yeah, but I think it's good to have a workaround to run the RPI without overheating until LDK-node has a proper fix (I don't know how long that will take and I think it would be too difficult for us to do). At least my RPI is running without overheating now with the limits set in the service file.

I will close this again.