Splitstore: Defaults are not good

Checklist

[X] This is not a security-related bug/issue. If it is, please follow please follow the security policy.
[X] I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
[X] I am running the Latest release, the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
[ ] I did not make any code changes to lotus.

Lotus component

[X] lotus daemon - chain sync
[ ] lotus fvm/fevm - Lotus FVM and FEVM interactions
[ ] lotus miner/worker - sealing
[ ] lotus miner - proving(WindowPoSt/WinningPoSt)
[ ] lotus JSON-RPC API
[ ] lotus message management (mpool)
[ ] Other

Lotus Version

1.21.0-rc3

Repro Steps

Run a node
Do a pruned import
Set the HotStoreFullGCFrequency = 1 variable to do prunes as often as possible
See that even then, the prune will never happen

Describe the Bug

After some investigation, I figured out that: HotStoreMaxSpaceThreshold

is actually defined as: "The maximum size the current hotstore + the potential new copy can occupy on disk"

Not "When HotStoreMaxSpaceTarget is set Moving GC will be triggered when total moving size exceeds HotstoreMaxSpaceTarget - HotstoreMaxSpaceThreshold" (as the docs state)

and HotstoreMaxSpaceSafetyBuffer should be defined as "the maximum size the new hotstore can be" instead of "Safety buffer to prevent moving GC from overflowing disk when HotStoreMaxSpaceTarget is set. Moving GC will not occur when total moving size exceeds HotstoreMaxSpaceTarget - HotstoreMaxSpaceSafetyBuffer"

Doc issues

The docs state these as defaults:

HotStoreMaxSpaceThreshold = 150000000000
HotstoreMaxSpaceSafetyBuffer = 50000000000

A node running without these values set will actually have these as defaults:

HotStoreMaxSpaceThreshold = 650000000000
HotstoreMaxSpaceSafetyBuffer = 50000000000

GC Hot CLI defaults

The docs state we should run lotus chain prune hot --periodic --threshold 0.00000001 and increase the number. The CLI default is 0.01, not 0.00000001.

Apart from that, its never explained what this threshold is. I now know its some magic badgerBS value, but still no idea what I'm actually setting when I change this value.

Default pruned chain examples

When running a node with a pruned chain, and HotStoreFullGCFrequency = 1, the first time I'm seeing a GC run, we get those logs. Meaning that the defaults make no sense - a freshly pruned chain will always exceed 50000000000 (the new hotstore's expected size is 245681686326)

It will also not trigger because the current size (245681686326 + current 448854471748 >= 650000000000)

Apart from these settings it looks like the prune logic doesn't take diskspace into account. I like that we can set our own thresholds, but in my case I just want 2 things:

Keep the chain as small as possible (without causing block sync delays)
Never make the node run out of diskspace

In my opinion, with using a clearer set of configuration params, we can achieve a nice config setup;

Define the max size you want the chain itself to be (inform the user, in the docs or something, that a "prune" will first take 30% of the chain's size extra)

Tell the system to always keep XX amount of diskspace free. This should be doable, because we know:
- the current size
- the expected shadow copy size
- the current free diskspace

Then we should always know when we're coming close to a point of no return and have to GC.

Default config options could just trigger GC when the system notices we're about to run out of diskspace

Logging Information

json
{"level":"warn","ts":"2023-04-19T15:34:02.290Z","logger":"splitstore","caller":"splitstore/splitstore_compact.go:255","msg":"missing object reference bafy2bzaceapqmsgwyjvurmxgti73xfpnbyakxgyua33yobgqkdgaieuyu6eyq in bafy2bzacec4ltib5nbeudklbcfqtteygv4hxnhjapqeighppqbny6txunwuyy"}
(... a bunch of these "missing object reference" messages ...) 
{"level":"info","ts":"2023-04-19T15:34:19.246Z","logger":"splitstore","caller":"splitstore/splitstore_compact.go:1358","msg":"purged cold objects","purged":36013396,"live":568}
{"level":"info","ts":"2023-04-19T15:34:19.246Z","logger":"splitstore","caller":"splitstore/splitstore_compact.go:814","msg":"purging cold objects from hotstore done","took":258.911799297}
{"level":"info","ts":"2023-04-19T15:34:19.246Z","logger":"splitstore","caller":"splitstore/splitstore_compact.go:950","msg":"ending critical section"}
{"level":"info","ts":"2023-04-19T15:34:19.246Z","logger":"splitstore","caller":"splitstore/splitstore_compact.go:816","msg":"critical section done","total protected size":46828899582,"total marked live size":617650}
{"level":"info","ts":"2023-04-19T15:34:19.247Z","logger":"splitstore","caller":"splitstore/splitstore_gc.go:48","msg":"measured hot store size: 448854471748, approximate new size: 245681686326, should do full true, can do full false"}
{"level":"warn","ts":"2023-04-19T15:34:19.247Z","logger":"splitstore","caller":"splitstore/splitstore_gc.go:54","msg":"Attention! Estimated moving GC size 245681686326 is not within safety buffer 50000000000 of target max 650000000000, performing aggressive online GC to attempt to bring hotstore size down safely"}
{"level":"warn","ts":"2023-04-19T15:34:19.247Z","logger":"splitstore","caller":"splitstore/splitstore_gc.go:55","msg":"If problem continues you can 1) temporarily allocate more disk space to hotstore and 2) reflect in HotstoreMaxSpaceTarget OR trigger manual move with `lotus chain prune hot-moving`"}
{"level":"warn","ts":"2023-04-19T15:34:19.247Z","logger":"splitstore","caller":"splitstore/splitstore_gc.go:56","msg":"If problem continues and you do not have any more disk space you can run continue to manually trigger online GC at aggressive thresholds (< 0.01) with `lotus chain prune hot`"}
{"level":"info","ts":"2023-04-19T15:34:19.247Z","logger":"splitstore","caller":"splitstore/splitstore_gc.go:72","msg":"garbage collecting blockstore"}
{"level":"info","ts":"2023-04-19T15:36:15.119Z","logger":"splitstore","caller":"splitstore/splitstore_gc.go:81","msg":"garbage collecting blockstore done","took":115.87249909}
{"level":"info","ts":"2023-04-19T15:36:15.119Z","logger":"splitstore","caller":"splitstore/splitstore_gc.go:64","msg":"measured hot store size after GC: 454373389774"}
{"level":"info","ts":"2023-04-19T15:36:16.534Z","logger":"splitstore","caller":"splitstore/splitstore_compact.go:160","msg":"compaction done","took":43545.729444289}
{"level":"info","ts":"2023-04-19T15:38:20.772Z","logger":"splitstore","caller":"splitstore/splitstore_compact.go:858","msg":"preparing compaction transaction"}

filecoin-project / lotus