Improvement Proposal: Prevent syncing to start from 0 in case of an error

ghost commented 1 year ago

I love Fulcrum, as soon as the index is built, it's super fast and fun to use.

But: Unfortunately, Fulcrum fails pretty often resulting in a corrupt database, which means you have to resync from scratch and can't use your Fulcrum instance for days.

I ended up running 4 instances locally, (one on x86_64, two on aarch64, one in Mac M1) so in case one instance fails, I can still copy over the db from another instance, which is done in minutes. Unfortunately, the rocksdb (higher version) on Mac creates a different db format, so I cannot use this index on my other machines (which is a pain, because it's by far the fastest computer I have).

It would be great if Fulcrum saves a state, let's say every 100,000 blocks, from where it can pick up the index creation in case anything after those "breakpoints" fails. Usually only one of the latest operations performed results in a corrupt db, the first 99% of the index should be fine.

I don't know about the internal structure of the stored data and if that's possible. A configurable option for this would be awesome, a full index on my machine is already ~130GB, if those "breakpoints" take up another few GBs or so, I'd be fine, hell, even if it writes a simple db clone to disk which needs another 130GB I'd be happy. But please don't make me wait again and again for operations, that have already been performed multiple times.

What do you think?

cculianu commented 1 year ago

I think this is the one huge problem with Fulcrum and there definitely is a way to address it. It needs to happen.

I am going to see about doing a crypto-based fundraiser soon to add this enhancement.

unfortunately cost of living is skyrocketing and a change this large would take weeks of development and testing and I need to justify that opportunity cost loss somehow (since during that time I cannot pursue consulting projects).

So yes this is on the agenda and I will put together a fundraiser drive very soon to finance this!

ghost commented 1 year ago

Ok I see, for the moment, I'll work with manual backups.

BTW it's great to see how well Fulcrum makes use of the available system resources on multi-core machines. This is some neat piece of software my man. 👍🏻

cculianu commented 1 year ago

Thanks man I tried to dole work off to a thread pool as much as possible. Glad to know you like it.

Yes, I will add some mechanism either to take explicit snapshots periodically or something. You are not the first person to complain about it not being tolerant of being killed unexpectedly...

ghost commented 1 year ago

Strangely these kills occur now every couple hours (on my Odroid, my 2nd fastest machine😩). I have a 4 GB machine and initially started with optimised fast-sync, db_mem, utxo cache settings, npw I more and more went back to the default values (except fast-sync, which is still set to 1000MB). You mentioned some shaky behavior for fast-sync in the comments, so if Fulcrum gets killed again (out of nothing really, no error in debug messages), I'll reset it to default as well.

fast-sync never made problems so far, the kills not only occurred during the indexing process, but also on fully synced instances.

I'll probably set up a nightly cron job, which stops Fulcrum, backs up the db and restarts it afterwards.

cculianu commented 1 year ago

There is nothing in the logs? Anyway if you only have a 4gb machine please enable swap. I suspect it just randomly OOMs. 4GB is not that much enable like 4GB swap as well just for the random times memory consumption exceeds 4GB total on the machine.

ghost commented 1 year ago

Nothing, the CLI just says "Killed" (Fulcrum started with debug=true and quiet=false), so I assume it's memory-related. I'll try without fast-sync next and then without swap space.

ghost commented 1 year ago

Here's one of the latest errors:

fulcrum@rpi:/mnt/hdd/user_homes/fulcrum/Fulcrum$ ./Fulcrum fulcrum.config >fulcrum.log
terminate called after throwing an instance of 'DatabaseError'
  what():  Error issuing batch write to scripthash_unspent db for a shunspent update: Corruption: block checksum mismatch: stored = 1679176786, computed = 1105712891  in /mnt/hdd/user_homes/fulcrum/Fulcrum/database/scripthash_unspent/003402.sst offset 10267554 size 4054
Aborted (core dumped)

fulcrum.log:

[2023-09-02 12:23:10.052] (Debug) batch write of 100000 utxoset items took 404.984 msec
[2023-09-02 12:23:11.564] (Debug) batch write of 100000 utxoset items took 482.958 msec
[2023-09-02 12:23:12.118] (Debug) batch write of 83634 utxoset items took 449.329 msec

Let me know if I should open bug for it.

ghost commented 1 year ago

Update: Fulcrum kept crashing on my Linux machines, so I compiled Fulcrum from source and built it with the latest rocksdb. Works beautifully now, I could use the index data from the M1 and all in all everything is very stable, i had no more killed processes.

ghost commented 11 months ago

Hi,

just a quick note that I made very good experiences using rocksdb 8.x. It seems most - if not all - crashes were rather related to the older 5.x version than to Fulcrum. Got it running several weeks without any db corruption.

Cheers Martin ------- Original Message ------- On Monday, August 28th, 2023 at 12:43, Calin Culianu @.***> wrote:

There is nothing in the logs? Anyway if you only have a 4gb machine please enable swap. I suspect it just randomly OOMs. 4GB is not that much enable like 4GB swap as well just for the random times memory consumption exceeds 4GB total on the machine.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

cculianu commented 11 months ago

Yeah, good to know! Maybe I will update the official sources to use 8.x rocksdb rather than 6.x series I develop against now. Thanks for the update!

On Wed, Sep 20, 2023 at 5:30 PM martinneustein @.***> wrote:

Hi,

just a quick note that I made very good experiences using rocksdb 8.x. It seems most - if not all - crashes were rather related to the older 5.x version than to Fulcrum. Got it running several weeks without any db corruption.

Cheers Martin ------- Original Message ------- On Monday, August 28th, 2023 at 12:43, Calin Culianu @.***> wrote:

There is nothing in the logs? Anyway if you only have a 4gb machine please enable swap. I suspect it just randomly OOMs. 4GB is not that much enable like 4GB swap as well just for the random times memory consumption exceeds 4GB total on the machine.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/cculianu/Fulcrum/issues/195#issuecomment-1727849561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACBDA2KHTWHM34TDNJNXNTX3L4W5ANCNFSM6AAAAAA4ADGFJI . You are receiving this because you commented.Message ID: @.***>

PrinceOfEgypt commented 8 months ago

I'll probably set up a nightly cron job, which stops Fulcrum, backs up the db and restarts it afterwards.

Did you ever set this up? If so would you mind sharing the job / script? I'm interested in doing the same thing, preferably an incremental backup with rsync.

cculianu / Fulcrum

Improvement Proposal: Prevent syncing to start from 0 in case of an error #195