Electrs not finishing hashes

Naryck commented 4 years ago

Server is running on AWS - m5a.xlarge, 4 CPU, 16GB RAM, SSD 1000GB; OS - Ubuntu 18, installed bitcoin core, run it as bitcoind -daemon

GIT pulled Blockstream Electrs, switched to branch, started the cargo with cargo run --release --bin electrs -- -vvvv --daemon-dir ~/.bitcoin --http-addr 0.0.0.0:3001

Electrs was doing hashes for a long-long time, and once I experienced Ubuntu's error 'too many files open', but fixed it successfully. Now, the last time I launched, Electrs wrote me to console:

TRACE - skipping block 00000000000000d41e5497f984517d27afcd85dad80277e7b7856418a72c7f69
TRACE - skipping block 00000000000005c35bbadf1bced112ddd3fff6e8e63575832bb3ba788889c866
TRACE - skipping block 000000000000025925f4fab2b7ebe729e7c7d483098863a7cd31bc5c773a5098
TRACE - fetched 0 blocks
DEBUG - writing 0 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00025.dat"
TRACE - parsing 134145055 bytes
TRACE - fetched 1383 blocks
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00026.dat"
TRACE - parsing 134182236 bytes
TRACE - fetched 1345 blocks
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00027.dat"
TRACE - parsing 134150531 bytes
TRACE - fetched 1290 blocks
TRACE - parsing 134189077 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00028.dat"
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00029.dat"
TRACE - parsing 134205573 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00030.dat"
DEBUG - writing 1979996 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134113675 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00031.dat"
TRACE - fetched 1754 blocks
DEBUG - writing 1994881 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134169607 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00032.dat"
TRACE - fetched 1443 blocks

And that's it. Nothing happens for more then an hour. I thought - "That's it, everything works!". Tried to make HTTP requests to server from internet (AWS SGs allow access to 3001 port) and from CLI to localhost itself, but receive this:

ubuntu@ip-172-61-11-224:~/electrs$ curl http://localhost:3001/blocks/tip/height
curl: (7) Failed to connect to localhost port 3001: Connection refused

And, BTW, when Electrs run, it starts some stats server on port :4224. When I do a request to localhost:4224 I receive some data, that means that this part of Electrs works, but not the Electrs itself.

I've made everything by instruction, step by step. I receive no errors. What else have I forgot?

shesek commented 4 years ago

Well, it definitely did not finish, there should be over 2000 blk files and it seems to have stopped at 32.

It is still running? Did it print anything else since? What happens if you restart it?

shesek commented 4 years ago

If this happens again following a restart, could you try checking the CPU usage and disk I/O of electrs while in this state to see if its crunching something or just stuck doing nothing? You can do this with something like htop, which can also be used to show disk usage (I believe this requires running htop as root).

Naryck commented 4 years ago

OK, I started electrs again with the same command: cargo run --release --bin electrs -- -vvvv --daemon-dir ~/.bitcoin --http-addr 0.0.0.0:3001

It was writing skipping blocks for quite a period of time, during this process CPU usage was low (15..25% on every core) and no disk writes, only reads:

Then, it stopped skipping and started to do something else:

And after it stopped skipping it started to write data to disk, memory is now 10GB used and CPU usage jumped up to 90+%

this last message in console output TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00072.dat" is hanging there for already over 30 minutes, Electrs still reads/writes something from/to instance's drive. But nothing else changes, still blk00072.dat is the last line in the output

P.S. I was making this from scratch for couple of times on another server, so don't be thinking that this time it moved from block 32 to block 72. It may well be a different server with the same problem, stuck with another block

Naryck commented 4 years ago

Now I've noticed Electrs has reached block 73. This took 2..3 hours. All this time 4 cores were over 60%, sometimes near 100%, and read/write speeds were like ~20MB/s After ~4 hours block 74 was started. 2 hours later - block 75

Keeping you informed.

Naryck commented 4 years ago

24 hours later: block 85 is being processed now. I think something is wrong. I believe, there are more blocks appeared on the blockchain during this period, than was processed by 4 core 16GB server with SSD.

Any ideas? What can be done, what can be checked?

shesek commented 4 years ago

Which commit are you running from? Could you try with latest master if see if anything is different?

Naryck commented 4 years ago

Which commit are you running from? Could you try with latest master if see if anything is different?

ubuntu@ip-172-61-11-224:~/electrs$ git show
commit 49cf7a7dffe8dccb3d4c5cb543aa05878abf086b (HEAD -> new-index, origin/new-index, origin/dev, origin/HEAD)

OK, I'll try on a clean instance with master branch. But it'll take a while

P.S. can it be that there's not enough free space on an instance? I've created the instance with 1000GB HDD, but some data is occupied with system. May be I need to do it with 1100 or 1200GB. Will it behave like that in case of insufficient space or it'll throw an error, specifying 'not enough free space'?

shesek commented 4 years ago

It should issue a "not enough space" warning if that happens, and I believe 1TB should be sufficient.

Btw, there's a new option called --lightmode which reduces storage requirements by about 270GB, which might be useful.

Naryck commented 4 years ago

I've removed electrs db directory, switched to master and ran command: cargo run --release --bin electrs -- -vvvv --daemon-dir ~/.bitcoin

It was indexing for some time. Text was blue. Now it shows purple text, I guess it finished indexing. Now If I run again it shows everything in purple

Now: how to check if the server returns some data? I have no idea of how to make JSON-RPC requests

shesek commented 4 years ago

If the electrum rpc server started, the http server should've started too. You can test them with:

$ echo '{"id":1,"method":"server.version","params":[]}' | nc -v localhost 50001
$ curl -v http://localhost:3000/blocks/tip/hash

The "missing size" warning appears to be a bug, I'm working on a fix.

shesek commented 4 years ago

Oh shit, I realized what happened... ugh. :( We're using the new-index branch here as the mainline development branch, not master -- which still points to a very old version of the upstream electrs (which was using a bitcoin core rpc feature that got deprecated in 0.19, which was causing these warnings).

Can you try (yet) again after checking out new-index? Sorry for the trouble!

Naryck commented 4 years ago

Oh shit, I realized what happened... ugh. :( We're using the new-index branch here as the mainline development branch, not master -- which still points to a very old version of the upstream electrs (which was using a bitcoin core rpc feature that got deprecated in 0.19, which was causing these warnings).

Can you try (yet) again after checking out new-index? Sorry for the trouble!

You've told to try master after new-index failed for me, indexing one block per 2 hours. I've switched to master and it finished indexing. Now I do what you wrote:

ubuntu@ip-172-61-11-224:~$ echo '{"id":1,"method":"server.version","params":[]}' | nc -v localhost 50001
Connection to localhost 50001 port [tcp/*] succeeded!
{"id":1,"jsonrpc":"2.0","result":["RustElectrum 0.1.0","1.2"]}

ubuntu@ip-172-61-11-224:~$ curl -v http://localhost:3000/blocks/tip/hash
*   Trying 127.0.0.1...
* TCP_NODELAY set
* connect to 127.0.0.1 port 3000 failed: Connection refused
* Failed to connect to localhost port 3000: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 3000: Connection refused

Second command which failed - is it a request to bitcoind? Should I again switch to new-index branch?

shesek commented 4 years ago

You've told to try master after new-index failed for me

Yes, I was mistaken, I meant to ask you to update to the most recent version of new-index, not to switch to master.

Should I again switch to new-index branch?

Yes, please do :)

Naryck commented 4 years ago

hmmm, interesting

ubuntu@ip-172-61-11-224:~/electrs$ git branch
  master
* new-index

ubuntu@ip-172-61-11-224:~/electrs$ cargo run --release --bin electrs -- -vvvv --daemon-dir ~/.bitcoin --http-addr 0.0.0.0:3001
    Updating git repository `https://github.com/shesek/rust-elements`
error: failed to resolve patches for `https://github.com/rust-lang/crates.io-index`

Caused by:
  failed to load source for a dependency on `elements`

Caused by:
  Unable to update https://github.com/shesek/rust-elements?rev=bab177bb9a1d7293a71eb5ed78755811995b5688#bab177bb

Caused by:
  revspec 'bab177bb9a1d7293a71eb5ed78755811995b5688' not found; class=Reference (4); code=NotFound (-3)

ubuntu@ip-172-61-11-224:~/electrs$ git rev-parse HEAD
1553c7bbd7768f74ff33c354f3403ebc0e2b47c2

shesek commented 4 years ago

Oh, yes, that is a known issue that I recently fixed in the dev branch but not in new-index. I applied the fix on new-index too, should work if you pull and try again.

Naryck commented 4 years ago

Yes, started to compile/build. Will update you later

Naryck commented 4 years ago

OK, txstore has finished during night. Now I'm having this output in console:

DEBUG - writing 1780189 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134206438 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00007.dat"
TRACE - fetched 5708 blocks
DEBUG - writing 1915792 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134203305 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00008.dat"
TRACE - fetched 6403 blocks
DEBUG - writing 1989328 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134202614 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00009.dat"
TRACE - fetched 7473 blocks

Every block takes 1..2 hours. Haven't I came back to what I started with?

Naryck commented 4 years ago

OK, I've been watching the console time after time, and when I saw that a new block was processed, I was writing the time when it happened. It doesn't mean that it's milliseconds-accurate, I may not seen the changes like 5..15 minutesafter they happened, but at least you can see the approximate intervals. Also there was a time without timestamps - it was the night time, I wasn't watching.

TRACE - skipping block 0000000000000ec74fd05ac2ef343fe6232cf8d274f71b655d6e6f0473afdc3b
TRACE - skipping block 0000000000000b3b8772fb40bcbacf7e50feb27db3bc44ce69630dc82c69a5e9
TRACE - skipping block 0000000000000863f4b537afbfbc358da536e377fa2d4192640509496f2aeee2
TRACE - skipping block 000000000000029a8bfa52e84c68467c03405f272e4a514eabc4139ffa684d3a
TRACE - skipping block 0000000000000a67a0e95ce776686499ab02261dd7ccad0a5b506d8b6cee5e59
TRACE - skipping block 0000000000000ad67cb8ea84d4fe60774188846fba6128b3501c9e18fec556e3
TRACE - skipping block 0000000000000aeee6503f63ddc6ff3db97c056a56c98fdc9f10c9f97c1eeee7
TRACE - skipping block 00000000000005beb5017215d712c498566125e36210b3d889bf737e1d2a0875
TRACE - skipping block 000000000000045ae08080ab205f50bdd5d9c7419d20f832f11b8fb264d48ea4
TRACE - skipping block 000000000000037f030ddb477e04a7e3f8508b060f4d89ef893495dd47839aff
TRACE - fetched 0 blocks
DEBUG - writing 0 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - fetched 7218 blocks
TRACE - fetched 6233 blocks
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00009.dat"
TRACE - parsing 134185314 bytes
TRACE - fetched 6117 blocks
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00010.dat"
TRACE - parsing 134203978 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00011.dat"
TRACE - parsing 134021138 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00012.dat"
18:50
DEBUG - writing 1877736 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134118150 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00013.dat"
TRACE - fetched 3957 blocks
19:38
DEBUG - writing 1908169 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134213364 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00014.dat"
TRACE - fetched 1516 blocks
19:50
DEBUG - writing 1922379 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 133971929 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00015.dat"
TRACE - fetched 1542 blocks
20:47
DEBUG - writing 1921005 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134200568 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00016.dat"
TRACE - fetched 1377 blocks
21:56
DEBUG - writing 1894647 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134107841 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00017.dat"
TRACE - fetched 1089 blocks
22:42
DEBUG - writing 1920965 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134161940 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00018.dat"
TRACE - fetched 1787 blocks
DEBUG - writing 1944204 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134094937 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00019.dat"
TRACE - fetched 1848 blocks
DEBUG - writing 1952684 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134175298 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00020.dat"
TRACE - fetched 1402 blocks
DEBUG - writing 1965537 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 133980193 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00021.dat"
TRACE - fetched 1550 blocks
DEBUG - writing 1958276 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134205298 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00022.dat"
TRACE - fetched 1528 blocks
DEBUG - writing 1957164 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134126642 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00023.dat"
TRACE - fetched 1196 blocks
DEBUG - writing 1947934 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134116659 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00024.dat"
TRACE - fetched 1531 blocks
DEBUG - writing 1974021 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134102394 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00025.dat"
TRACE - fetched 1316 blocks
DEBUG - writing 1933852 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134145055 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00026.dat"
TRACE - fetched 1516 blocks
DEBUG - writing 1966494 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134182236 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00027.dat"
TRACE - fetched 1600 blocks
07:56
DEBUG - writing 1958320 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134150531 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00028.dat"
TRACE - fetched 1383 blocks
08:43
DEBUG - writing 1969844 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134189077 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00029.dat"
TRACE - fetched 1345 blocks
09:50
DEBUG - writing 1962387 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134205573 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00030.dat"
TRACE - fetched 1290 blocks
10:48
DEBUG - writing 1979996 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134113675 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00031.dat"
TRACE - fetched 1754 blocks
11:23
DEBUG - writing 1994881 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134169607 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00032.dat"
TRACE - fetched 1443 blocks
12:22
DEBUG - writing 1981361 rows to RocksDB { path: "./db/mainnet/newindex/history" }, flush=Disable
TRACE - parsing 134217280 bytes
TRACE - reading "/home/ubuntu/.bitcoin/blocks/blk00033.dat"
TRACE - fetched 1212 blocks
13:11

Naryck commented 4 years ago

Is there anything left that can be checked/changed to check?

greenaddress commented 4 years ago

@Naryck we can't reproduce on our onsite hardware (ryzen threadripper cpu) or on google cloud.

I am not sure if it will help but maybe you can try latest version of master, we merged a number of improvements (but it may not make a difference for the issue at hand)

Naryck commented 4 years ago

OK, I'll now try the clean server from scratch

Naryck commented 4 years ago

I may be doing something wrong, but no master branch

Naryck commented 4 years ago

And I think I know what you can try to reproduce: don't create an instance with "system HDD + data HDD", attaching 1TB volume to the instance, just create an instance from the start with 1TB HDD - only one. And try to do everything with only one volume

shesek commented 4 years ago

Wait, are you using HDD or SSD?

And @greenaddress meant the new-index branch, not master (but you should already be up-to-date with all the recent optimizations, so I'm not sure that would help)

Naryck commented 4 years ago

SSD of course

P.S. what I meant: don't use multiple SSD volumes, just create one 1TB volume and try to do everything with it. P.P.S. completely new server, which I ran from scratch yesterday, ended up with the same problem: it finished hashing newindex/txstore yesterday and now it's was hashing to newindex/history the whole night (like 8 hours already) and now it's only on the block blk00012.dat

Naryck commented 4 years ago

OK, doesn't matter now. I've took the volume snapshot from another account and transferred to mine. now Electrs work. Still, if you're interested in what happened: try to do it on an instance with only one SSD volume, which contains system and all the stuff

youngqqcn commented 4 years ago

OK, doesn't matter now. I've took the volume snapshot from another account and transferred to mine. now Electrs work. Still, if you're interested in what happened: try to do it on an instance with only one SSD volume, which contains system and all the stuff

Hi, I have the same problem . How did you resolve this problem?

Netzhangheng commented 4 years ago

OK, doesn't matter now. I've took the volume snapshot from another account and transferred to mine. now Electrs work. Still, if you're interested in what happened: try to do it on an instance with only one SSD volume, which contains system and all the stuff

Hi, I have the same problem . How did you resolve this problem?

I also have this problem. Electrs uses 5% CPU and 8G RAM when history indexing. It used 100% CPU and 6G RAM when txstore indexing. It seems that an hour produces a historical SST.

shesek commented 4 years ago

@Netzhangheng are you using an SSD?

The first txstore indexing stage doesn't involve any database reads, while the second history stage reads heavily. If you're not using SSD, the second stage would be incredibly slow.

Netzhangheng commented 4 years ago

@shesek I used an HDD before.

I will try agin with an SDD. Thank you.

mrtiggles5 commented 2 years ago

@shesek When you say that the second stage reads heavily do you mean that it reads heavily from the block .dat files or from /txstore?

slice-prakhar commented 1 year ago

@Naryck Actually I am doing exactly what you were trying. Same AWS type things. SIngle volume of 1.5 TB and all.

Previously it did not work for me. it was downloading blocks to ./db folder.

Now when i run the cargo cmd (cargo run --release --bin electrs -- -vvvv --daemon-dir ~/.bitcoin) It gives me the following error:

 Running `target/release/electrs -vvvv --daemon-dir /home/bitcoin/.bitcoin`
Config { log: StdErrLog { verbosity: Trace, quiet: false, show_level: true, timestamp: Off, modules: [], writer: "stderr", color_choice: Auto }, network_type: Bitcoin, db_path: "./db/mainnet", daemon_dir: "/home/bitcoin/.bitcoin", blocks_dir: "/home/bitcoin/.bitcoin/blocks", daemon_rpc_addr: 127.0.0.1:8332, cookie: None, electrum_rpc_addr: 127.0.0.1:50001, http_addr: 127.0.0.1:3000, http_socket_file: None, monitoring_addr: 127.0.0.1:4224, jsonrpc_import: false, light_mode: false, address_search: false, index_unspendables: false, cors: None, precache_scripts: None, utxos_limit: 500, electrum_txs_limit: 500, electrum_banner: "Welcome to electrs-esplora 0.4.1" }
thread 'main' panicked at 'failed to start monitoring HTTP server at 127.0.0.1:4224', src/metrics.rs:71:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted

So it is basically aborting. Not sure why this is happening now. PS : My OS in ubuntu.

Blockstream / esplora

Electrs not finishing hashes #191