Blockvailt: indexes on BlockData table; table cleanup API

cc32d9 commented 3 years ago

as of v2.1.0-rc3, plugins/blockvault_client_plugin/postgres_backend.cpp

the vault plugin writes every new block into BlockData table, performing a SELECT on several fields that are not indexed. It means before every select, Postgres needs to scan the whole table. It means the database transaction time will grow linearly. Also the table is emptied only on snapshots, and we normally don't do snapshots on producer nodes.

So, it renders the solution hardly usable because it will have to scan through thousands of rows after a few days of operation.

The module should create indexes along with the table creation, so that the SELECT query is executed with most efficiency.

Also there should be an HTTP API that flushes the table, instead of doing it at snapshot time.

Also a wish, to have some different backend, more suitable for the task. For example, Redis has built-in clustering, and it's much faster than Postgres because of small overhead.

b1bart commented 3 years ago

Thank you for the feedback. I do want to point out that as a developer preview, BlockVault was focussed on correctness of its core guarantee for disaster recovery and double-production-prevention (and not performance). I understand this can make it difficult to evaluate on production networks at this time.

In order to accomplish the guarantee that BlockVault can recover your block producer as long as a single copy of the database survives, the system only prunes material data when presented a snapshot that is the aggregate of that data. Flushing those tables without a snapshot invalidates the core use-case of BlockVault and places the burden back on the user to guarantee disaster recovery via another process.

It is worth noting that this snapshot can be provided by any node connected to BlockVault not just an active bp node submitting blocks.

This core guarantee may also be an issue for a naive in-memory redis cluster deployment as we have strict data durability requirements in degraded operation however, we can add it to the list of backends for evaluation in the future.

We will continually improve the performance of the queries in the coming releases, though I doubt it will ever be performant if left un-pruned.

cc32d9 commented 3 years ago

I'm not convinced data persistence is important here. We need to make sure that only one producing node publishes the new block within its schedule, and others do not. If the settlement mechanism is unavailable, none should produce a block.

So, it needs a distributed mutex solution, one that is fault tolerant. Postgres here is by far sub-optimal, especially that it doesn't offer an out-of-the-box clustering solution.

Besides, even with a clustered Postgres, say producer node A speaks to Postgres server X, and producer node B speaks to Postgres server Y. While X and Y are in sync, everything works. What happens when the communication between X and Y breaks? Will each behave as a standalone server, or one of them should die? This part of the design is still not covered.

cc32d9 commented 3 years ago

also, creating snapshots is something a BP is doing independently (or they trust the snapshots made by another producer). It looks odd that the high-availability solution depends on snapshotting.

aclark-b1 commented 3 years ago

I think this may have been accidentally closed during cleanup. Reopening and marking as enhancement for tracking purposes.

cc32d9 commented 3 years ago

thank you, and here's a detailed description of a distributed lock that seems to work better than postgres DB: https://redis.io/topics/distlock It has built-in fault tolerance and predictable convergence time.

EOSIO / eos

Blockvailt: indexes on BlockData table; table cleanup API #10066