"syncing buffers" seems unnecessary on shutdown in `heap` mode

matthewdarwin commented 2 years ago

When shutting down nodeos that is running in heap mode, it takes quite a while. "Syncing buffers" takes a bunch of that time, and on servers with 1000s of competing processes (running zfs) takes a long time. It is doing some sort of fsync() here.

Not sure why fsync() is needed?

Most often I'm stopping nodeos to change a configuration parameter, then starting it again. If I am shutting down a server (for reboot), then the o/s will take care of syncing to disk before reboot. I guess if you're shutting down nodeos, then doing something else and then a power outage kills your machine then the fsync is a good idea.

There are so many scenarios where nodeos doesn't start and needs a snapshot. Seems we could save time here and (optionally) not fsync(). Maybe for private networks we need to be more careful, but on public networks, there are (not ideal public) snapshots available from which to resume.

spoonincode commented 2 years ago

We want high confidence that the shared_memory.bin is fully intact from a cleanly shutdown nodeos. Otherwise it's impossible to say for certain all its data structures are sane which can result in undefined behavior (possibly spooky undefined behavior).

The current design is still very legacy "mapped mode"-centric, and in such a mode waiting for the data to be on disk and then flipping the dirty flag works well enough (this fsync() does occur in mapped mode too).

Clearly in heap mode there are better options available to us. For example, when writing out the data accumulate it in a checksum, and then only fsnyc() the checksum to the file. But behavior such as that makes it difficult to accommodate the current ability to seamlessly switch between mapped & heap modes.

There is a proposal out there to eliminate the mapped mode entirely and replace it with a hybrid solution instead. That may be the time to investigate more clever approaches such as the above. In the short term we can improve the performance of this operation.

All that said, leaving it up to the OS to eventually get all the data on disk isn't as surefire as it may sound because history has shown us that sometimes users run the application within multiple layers of abstraction that interferes with expected clean shutdown semantics. Trusting all those layers to properly drain all their pages before some arbitrary timeout of some other layer may be too optimistic. Users get frustrated when they "cleanly shutdown nodeos" and it still reports its database as dirty.

spoonincode commented 11 months ago

Will use AntelopeIO/leap#1870 for future discussion

eosnetworkfoundation / mandel

"syncing buffers" seems unnecessary on shutdown in `heap` mode #797