getumbrel / umbrel-os

umbrelOS for Raspberry Pi 4 (only). Covert your Raspberry Pi into a home server in one click. For other hardware, checkout https://github.com/getumbrel/umbrel
https://umbrel.com
BSD 3-Clause "New" or "Revised" License
583 stars 56 forks source link

Umbrel suddenly starts to crash after running one week without issues #218

Closed marviins87 closed 2 years ago

marviins87 commented 3 years ago

My Umbrel node started crashing every few hours since a few days. It was running without issues one week and it suddenly started. I'm unable to find the cause of it. It's a Pi 4 (8GB) with a SanDisk microSD card + 1TB Samsung SSD portable T5 disk and it's powered by the original Pi 3A power supply.

This script could not automatically detect an issue with your Umbrel. Please share the following links and paste it in the Umbrel Telegram group (https://t.me/getumbrel) so we can help you with your problem. https://umbrel-paste.vercel.app/171319d544c968b72d5d0110ece367e2

Anyone knows what is happening with my node or any other troubleshooting steps I should try?

bolaum commented 3 years ago

Maybe it's related to a problem I was having recently. In my case, it wasn't really crashing but network buffers getting full. Not even ssh was working after a few hours, but the board didn't seem to crash. Investigating logs, I found some weird messages regarding conntrack. I did the following:

echo 'nf_conntrack' >> /etc/modules

Add the following to the end of /etc/sysctl.d/99-sysctl.conf:

net.netfilter.nf_conntrack_max = 1048576
net.netfilter.nf_conntrack_tcp_timeout_established = 600
net.netfilter.nf_conntrack_generic_timeout = 60

Reboot.

Check new values after reboot with:

sysctl net.netfilter.nf_conntrack_max
sysctl net.netfilter.nf_conntrack_tcp_timeout_established
sysctl net.netfilter.nf_conntrack_generic_timeout

This will load nf_conntrack module and increase the max number of connections. It will also decrease the time for connections timeout.

After these changes, it seems stable. Hope it helps.

@louneskmt @AaronDewes do you think this deserves a PR?

AaronDewes commented 3 years ago

There is another network related issue which is more likely to happen, yours seems to happen very rarely

jonsyu commented 2 years ago

Closing due to inactivity