ChainSafe / gossamer

πŸ•ΈοΈ Go Implementation of the Polkadot Host
https://chainsafe.github.io/gossamer
GNU Lesser General Public License v3.0
427 stars 110 forks source link

"too many open files" error #1453

Closed noot closed 3 weeks ago

noot commented 3 years ago

Describe the bug

Possible Solution

Log output

Log Output ```Paste log output here paste log output... ```


2021-03-11T08:53:24.349Z    ERROR   basichost   failed to resolve local interface addresses {"error": "route ip+net: netlinkrib: too many open files"}
WARN[03-11|08:53:25] failed to handle block data              pkg=network start=4332161 end=4332288 error="Create value log file. Path=/home/ubuntu/.gossamer/ksmcc/000275.vlog. Error=open /home/ubuntu/.gossamer/ksmcc/000275.vlog: too many open files" caller=sync.go:542
INFO[03-11|08:53:27] πŸ”— imported blocks                     pkg=network from=4332160 to=4332193 hashes="[0x666ff5c507736210ec92d982f0a2ebc02eb46ab6f113636627e810d18035d7fc ... 0x44db2838ed3d071cb4a6c67c95dfc43a7ce210be4acb853ba003fb99d8f32276]" caller=sync.go:271
INFO[03-11|08:53:27] 🚣 currently syncing                   pkg=network goal=6557900 average blocks/second=6.600 overall average=15.362 caller=sync.go:275
// network debug logs omitted
panic: parent state root does not match snapshot state root

goroutine 7771 [running]:
github.com/ChainSafe/gossamer/dot/sync.(*Service).handleBlock(0xc05b816370, 0xc0fa90ee80, 0x10, 0xc088d680e0)
    /home/ubuntu/gossamer/dot/sync/syncer.go:302 +0xa06
github.com/ChainSafe/gossamer/dot/sync.(*Service).ProcessBlockData(0xc05b816370, 0xc0671a6000, 0x2e0, 0x400, 0x4, 0x4)
    /home/ubuntu/gossamer/dot/sync/syncer.go:239 +0x6b6
github.com/ChainSafe/gossamer/dot/network.(*syncQueue).processBlockResponses(0xc03f4f7970)
    /home/ubuntu/gossamer/dot/network/sync.go:540 +0x59a
created by github.com/ChainSafe/gossamer/dot/network.(*syncQueue).start
    /home/ubuntu/gossamer/dot/network/sync.go:153 +0x89

on restart:

INFO[03-11|14:35:22] πŸ•ΈοΈ initializing node services...    pkg=dot name=Kusama id=ksmcc basepath=/home/ubuntu/.gossamer/ksmcc caller=node.go:170
INFO[03-11|14:35:26] detected abnormal node shutdown, restoring from last finalized block pkg=state caller=service.go:280
EROR[03-11|14:35:26] failed to create node services           pkg=cmd error="failed to create state service: failed to start state service: failed to load storage trie from database: failed to find node key=3987f0a45387c3aac2c7819f74c2b9df92edf5f8ce65537007281a9d97ba258c index=1: Key not found" caller=main.go:235
failed to create state service: failed to start state service: failed to load storage trie from database: failed to find node key=3987f0a45387c3aac2c7819f74c2b9df92edf5f8ce65537007281a9d97ba258c index=1: Key not found

Specification

arijitAD commented 3 years ago

Filecoin lotus uses badger. A similar issue was filed there. https://github.com/filecoin-project/lotus/issues/2038 The solution suggests increasing the ulimit.

at first, I saw it in the libp2p logs but eventually showed up in badger as well, and eventually the node crashed due to invalid state

The error shows in libp2p due to two reasons:

  1. We are using badger in libp2p for persistent peerstore.
  2. Since socket in linux is also a file, every libp2p connection is a file.
arijitAD commented 3 years ago

https://dgraph.io/docs/badger/faq/

Screenshot 2021-05-21 at 11 13 40 AM

Since badger mentions this in doc, we should also include this in our docs and explain user to delete some blocks using rewind and restart the node in case of corruption.

timwu20 commented 2 years ago

@danforbes can you take care of this?

timwu20 commented 3 weeks ago

I think we resolved this. I haven't been able to reproduce.