Blockstream / esplora

Explorer for Bitcoin and Liquid
MIT License
997 stars 397 forks source link

electrs crashes while indexing mainnet with "Too many open files" #133

Open prashb94 opened 5 years ago

prashb94 commented 5 years ago

Electrs new-index works fine for testnet but while syncing mainnet, errors out with -

(Truncated log)

TRACE - skipping block 0000000000000000001a871a0c81fe392e9d90562e702eddd2835e27da815f1d
TRACE - skipping block 0000000000000000001198ed4b9090ef67acebc8ca517bdcd67efc930e554b6c
TRACE - skipping block 0000000000000000001c02b01cb173dc33cd901d0842be6f331037c03b1b1afa
TRACE - skipping block 000000000000000000131227a7c21c0c247b5ee30aeffbd1f9ccba6038d071d5
TRACE - skipping block 0000000000000000000c99cf30cb7609a3d3e1bc6b65c6360b03130e34b2f150
TRACE - fetched 9 blocks
DEBUG - writing 98889 rows to RocksDB { path: "./db/mainnet/newindex/txstore" }, flush=Disable
DEBUG - starting full compaction on RocksDB { path: "./db/mainnet/newindex/txstore" }
DEBUG - finished full compaction on RocksDB { path: "./db/mainnet/newindex/txstore" }
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While open a file for random read: ./db/mainnet/newindex/txstore/000938.sst: Too many open files" }', src/libcore/result.rs:997:5
Aborted (core dumped)

Also, the size of ./db is ~325GB. Is this normal?

greenaddress commented 5 years ago

@prashb94 size of db is normal, IIRC it can go up to 700+GB before it compacts back down (to 495GB, excluding bitocoind, if we include that total disk requirements, once compacted is around 765GB, but you will need more for the initial run).

How much ram do you have on the machine you run this? What OS/distro?

prashb94 commented 5 years ago

Thanks! It's on a t3.xlarge EC2 instance (4vCPU/16GB RAM) and attached 2TB of block storage. So memory isn't the issue. Any idea what could be causing it to crash?

Edit: OS - Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-1088-aws x86_64)

greenaddress commented 5 years ago

@prashb94 I am not too sure but I think this depends on the OS configuration. You may solve the issue by changing /etc/security/limits.conf see https://github.com/romanz/electrs/issues/28 as a similar issue but from https://github.com/romanz/electrs/issues/11 it appears it could also be related to a corrupted bitcoind block file.

Are you using any ad-hoc configuration for bitcoindd? Is the storage ssd?

Thanks

setpill commented 5 years ago

Running into same issue, does not seem to be OS config (or maybe I'm missing something?)

Trace (with RUST_BACKTRACE=full). NB: This occurred on the first run of the service after a reboot.

Only non-comment line in /etc/sysctl.conf: fs.file-max = 500000

Only non-comment lines in /etc/security/limits.conf:

*       soft    nofile      100000
*       hard    nofile      100000
clarkmoody commented 4 years ago

Seeing the same problem here. I got an interesting result when I deleted the cache directory and re-ran electrs: it tried opening a socket to listen from the server and produced a "too many open files" error, but the output message had fd: 1023 in the socket error. This hints at a 1024 fd limit for the process somehow. I upped the hard and soft limits on the machine to 500k and double-checked across all users. Somehow electrs did not have access to that limit.

Upstream electrs is setting the open files limit manually.

clarkmoody commented 4 years ago

Relevant lines in my logs. I guess the error happened when trying to connect to Bitcoind (port 8332)

Dec 11 06:59:50 - esplora-electrs[28090]: 2019-12-11T06:59:50.344+00:00 - ERROR - server failed: Error: failed to clone TcpStream { addr: V4(127.0.0.1:3000), peer: V4(127.0.0.1:8332), fd: 1023 }
Dec 11 06:59:50 - esplora-electrs[28090]: Caused by: Too many open files (os error 24)
setpill commented 4 years ago

The issue on my system turned out to be caused by systemd overriding system wide limits with a "sane" default. Was resolved by setting LimitNOFILE with a higher value in the electrs service file.

dongcarl commented 4 years ago

Here's how I got around this on the command line:

sudo prlimit --nofile=65536 sudo -u "$(id -u)" -g "$(id -g)" cargo blah blah wtv

The first sudo makes us root and gives us access to modify file limits, the second sudo brings us back to our original user to execute cargo properly

clarkmoody commented 4 years ago

The issue on my system turned out to be caused by systemd overriding system wide limits with a "sane" default. Was resolved by setting LimitNOFILE with a higher value in the electrs service file.

@setpill Excellent, thanks! Running via systemd here.

Might be nice to make a note of this in the docs 😉