FATAL: Caught exception: It appears that Fulcrum was forcefully killed in the middle of committing a block...

ghost commented 4 years ago

Great SPV server! It is fast and just what I needed.

I've encountered a problem when restarting Fulcrum:

FATAL: Caught exception: It appears that Fulcrum was forcefully killed in the middle of committng a block to the db. We cannot figure out where exactly in the update process Fulcrum was killed, so we cannot undo the inconsistent state caused by the unexpected shutdown. Sorry!

I restarted the server (via systemd) and also created an image snapshot for the server and then this error appears.

Having used Electrum Cash for a while, I never had any problems with this before (for years) even thought I would restart/terminate instances abruptly.

Is there anyway to prevent this problem from happening or gracefully recover? Thank you!

cculianu commented 2 years ago

Yeah I really need to set aside some time and have the fast-sync option auto-detect this situation on OS's such as linux that overcommit and prune it down if that happens. I definitely will work on this soon!

jonscoresby commented 2 years ago

Thanks for the feedback @jonscoresby ... Perhaps I need to go back and see if I can make --fast-sync more resilient to such conditions. Just curious: were you using swap at all or did you have swap disabled?

Sorry I didn't see this. I do not have a swap enabled.

cculianu commented 2 years ago

Ah I see. Thanks for getting back to me.

Yeah I have a hypothesis that this is more likely to happen in the "no swapfile" situation. I am not sure why it became fashionable to ship Linux installs these days with no swap. I remember a time when every Linux install had a default swapfile setup. At some point that changed. Anyway -- I think that in the no-swapfile case, memory usage can get out-of-hand temporarily with --fast-sync and rocksdb both gobbling up RAM. And, of course, if there's no swap.. when you are out of RAM .. something must die. And that thing is Fulcrum.

I can't fully control memory usage (because rocksdb lib does its own thing and sometimes overallocates memory temporarily even when you tell it not to). I can, however, mitigate this by detecting the situation and controlling the --fast-sync memory usage .. if it looks like we are reaching the system limit, I can just prune the cache temporarily to be smaller than what the user specified.. or something like that.

RequestPrivacy commented 2 years ago

Also no swapfile on my linux.

Let me know if I should test something once you might have figured out a solution.

chrisguida commented 1 year ago

Please, please, please fix this. We are trying to package Fulcrum for embassyOS and this makes the otherwise amazing experience very painful. It can take several days to build the index on a low-resource device in docker, and to be told that you have to do it all over again is enough to make the user want to simply delete it and switch back to electrs.

cculianu commented 1 year ago

I will fix in in a future release, that's the plan.

Please don't use --fast-sync that eats memory and is experimental. It's not really suited for systems with low memory and no swap. It shouldn't ever crash on initial synch as often as it does -- and I noticed everybody is using that option -- which probably is leading to OOM? I should have named it differently...

craigraw commented 1 year ago

I will fix in in a future release, that's the plan.

That's great to hear. I've also noticed that --fast-sync is often configured with values that are far too high for the system. Perhaps Fulcrum should warn if it's set to say > 20% of system memory?

That said, I do see this issue mentioned more frequently not for the initial sync, but for accidental power loss or other ungraceful shutdown conditions.

chrisguida commented 1 year ago

I will fix in in a future release, that's the plan.

Excellent, great to hear!

Please don't use --fast-sync that eats memory and is experimental.

This problem does not only present during initial sync. We have already experienced corrupted databases on a couple of devices that were already synced.

MattDHill commented 1 year ago

Any update on this issue and #155. Start9 is still very excited to get Fulcrum onto StartOS, but not as long as ungraceful shutdowns necessitate resyncs.

Is there any update on that issue as well as the issues related to "fast-sync" discussed above?

greenm01 commented 1 year ago

I lost power this morning and my Fulcrum database is now corrupted. It took several days to sync on my SSD. For the time being I will switch back to electrs until this issue is resolved.

fabiolameira commented 8 months ago

Hello 👋

I ran into this problem when trying to synchronize my Fulcrum Server. The process was consuming too much RAM until it was killed by the OOM Killer (Out of Memory killer), causing the program to be closed forcefully, and corrupting my fulcrum_db.

I tried with different settings in fulcum.conf:

fast-sync = 8192 | 4096 | 2048 | 1024 | 512
db_max_open_files = 400 | 200 | 100 | 50 | 40

And it always ended up failing and corrupting the db.

For context, this is my setup: OS: Ubuntu Server 22.04.3 LTS Processor: i5-6500 RAM: 16GB Disk: 2TB SSD

I compiled Fulcrum myself following the instructions detailed in the project's README.md and I didn't understand why this was happening, as it's not the first time I've synchronized a Fulcrum Server and it's never happened to me before.

As on other occasions I used images already compiled from the project and this had never happened, I thought it must be related to the way I compiled the project.

It was then that I noticed this:

$ Fulcrum -v
Fulcrum 1.9.8 (Release d4b3fa1)
Protocol: version min: 1.4, version max: 1.5.2
compiled: gcc 11.4.0
jemalloc: unavailable
Qt: version 5.15.3
rocksdb: version 6.14.6-ed43161
simdjson: version 0.6.0
ssl: OpenSSL 3.0.2 Mar 15, 2021
zmq: libzmq version: 4.3.4, cppzmq version: 4.7.1

jemalloc is unavailable when I run the $ Fulcrum -v command. Since there was no jemalloc installed on the system, the project was using the system memory allocator and not jemalloc. I immediately thought that the problem might be related to this, as the system allocator might not be able to manage RAM usage as expected.

To solve the problem, I installed jemalloc with the following command:

$ sudo apt update
$ sudo apt install libjemalloc-dev

I verified the installation by running:

$ pkg-config --modversion jemalloc

Then i verified if the flag for jemaloc exists by running:

$ pkg-config --cflags --libs jemalloc

This should return -ljemalloc

Then I recompiled the project. To do this, I ran the following commands:

# This will generate the Makefile linking our jemalloc
$ qmake LIBS+=-ljemalloc

This should return somethis like this:

Project MESSAGE: CLI overrides: LIBS=-ljemalloc
Project MESSAGE: ZMQ version: 4.3.4
Project MESSAGE: rocksdb: using static lib
Project MESSAGE: jemalloc: using CLI override
Project MESSAGE: Including embedded secp256k1
Project MESSAGE: Installation dir prefix is /usr/local

Then i run the following command to execute the Makefile:

# This will execute the Makefile with the number of cores available on your machine
$ make -j $(nproc)

Then just run:

# This will install the Fulcrum in you /usr/local/bin
$ make install

Finally, to check if jemalloc is being used by Fulcrum, run this command again:

$ Fulcrum -v

And you should see something like:

Fulcrum 1.9.8 (Release d4b3fa1)
Protocol: version min: 1.4, version max: 1.5.2
compiled: gcc 11.4.0
jemalloc: version 5.2.1-0-gea6b3e9
Qt: version 5.15.3
rocksdb: version 6.14.6-ed43161
simdjson: version 0.6.0
ssl: OpenSSL 3.0.2 Mar 15, 2021
zmq: libzmq version: 4.3.4, cppzmq version: 4.7.1

Since my Fulcrum installation is using jemalloc as a memory allocator, I never had any more problems with OOM Killer again, neither during synchronization nor during normal use after it was synchronized.

Hope this helps 🙏

cculianu / Fulcrum

FATAL: Caught exception: It appears that Fulcrum was forcefully killed in the middle of committing a block... #41