figitaki commented 3 years ago

Background

Traditional blockchain projects carry around their history — it's the blockchain. Any fully synced node has the information necessary to audit balance transfers on the protocol. The drawback is that most clients don't care about this history. Mina uses zk-SNARKs to compress the blockchain down to a constant size. We effectively forget the specific balance transfers that occurred at a given time. This is great for the common case, but bad in the special cases. We created the (optional) archive node process to store this history in the Mina protocol. Our archive node attempts to store all the blocks that Mina sees as valid inside a PostgreSQL database.

This is great when things are working, but sometimes things go wrong — nodes can go down, bugs can creep up that break our integration. The protocol itself is resilient to such scenarios, but currently our archival infrastructure is not.

This is unacceptable for main net launch. Full archive data is important for:

Clients to audit the balance transfers on chain — this is important for custodians, exchanges, and professional node operators (via Rosetta or manually)
Hard forks — We take advantage of the archive node to mold our new forked genesis ledger into a usable state

We must have a good story for bootstrapping, maintaining, and recovering archive data for these purposes.

Prior Art

In the past, we deployed a "points-hack" service that dumped GraphQL block JSON to cloud storage.
Matthew created a tool a while back that can replay blocks from a log file.
Luckily, with the help of community members such as (@ Gareth) we are able to, with some effort, recover most data on testnets when we notice some of our storage failed.

Outstanding Problems

During the past testnet (4.1) the node serving the Archive Node experienced an outage, causing us to miss blocks for a period of time. (At the time we were unable to recover from this outage since the Archive Node was not able to recover with missing data, this has since been addressed.)
Along with the hard fork we migrated the Archive schema which caused issues when trying to combine data from before the fork with data after the fork.
Our current GraphQL interface is lossy compared to the internal representation used by the Archive node and logs.

Proposal

Tackle resiliency with redundancy. Specifically we should be redundant across two dimensions: (1) horizontal scaling of archiving processors and (2) additional sources for recovering data into the database:

Scale up the existing archive node processors in our infrastructure. This amounts to just running more than one archive node processor on more than one node in our cluster. PostgreSQL will handle concurrent duplicate writes idempotently for us. Detect missing sub chains.
Add support for recovering block data from both GraphQL data and logs.

Regarding the additional sources of block data, we have already done work to recover node data from GraphQL: see this PR. We have also built out the support for logging block data that is sufficient to recover from: see this PR.

Since both the GraphQL data and the new block logging format are in JSON, the simplest and most resilient way for us to store them would be to pipe them into MongoDB. This way we don't need to handle merging the data together or dealing with converting from JSON into SQL or any other typed format.

Each source is piped into its own database (i.e. points-hack and block-logs)
Mongo handles de-duping blocks by making state hash unique, allowing us to have many nodes push there GraphQL data / logs without worrying about the DB growing too much

Then if we reach a point of catastrophic failure of the primary archive node, we can then fallback to one of our backup data soruces and recover from them in the following order:

Archive Node
Block logs
GraphQL dump

To address the issue of changing the schema along side a hard fork, we should always decouple schema migrations from hard forks. This allows us to migrate all existing archive databases well before a hard fork.

Tasks and Projects

[ ] Deploy a MongoDB database during testnet releases
[ ] Add the "points hack" back into the deployment and pipe the output into MongoDB (tracking issue)
[ ] Add support for the new block logging output and pipe the output into MongoDB (relevant PR)
[ ] Add support to the archive node for recovering from block logging source
[ ] Close out work for recovering from GraphQL data (relevant PR)
[ ] Rosetta recognizes missing data and serves a useful error message (tracking issue?)
[x] If individual blocks are missing, archiving other blocks still succeeds
[ ] Close out work for missing subchains (tracking issue)
[ ] Put missing subchain detection tool into infrastructure
[ ] Add the end-amounts to transactions in blocks via GraphQL to support the time-locked account schema changes
[ ] Audit our GraphQL API and make sure we address any discrepancies

Open Questions

~~How should we handle migrations to the archive schema?~~
What level of ongoing support & documentation do we need to provide for the block logging format?

bkase commented 3 years ago

How should we handle migrations to the archive schema?

We mentioned this during our discussion, but we should always migrate before a hard fork. I think it's worth capturing that explicitly. We can share the migration scripts before we release a hard fork as well.

What level of ongoing support & documentation do we need to provide for the block logging format?

I think we should share a sample + explain the purpose for this redundant data, but explain that the representations here map to our implementation details and are subject to change at any time without notice. For a more backwards compatible and safer reliable source, use GraphQL.

More tasks to add:

[ ] Put missing subchain detection tool into infrastructure
[ ] Deploy a MongoDB database during testnet releases

nholland94 commented 3 years ago

I think this is a great solution to storing our archive backups.

One note: I think we should explicitly deploy separate database instances for each of the resiliency sources. As in, one for the points hack, and one for the block logs. That way, if one goes down for whatever reason, it does not directly compromise the integrity of the other backup system.

garethtdavies commented 3 years ago

I would add to the above that having others running an archive node would obviously help as this would be a simpler recovery source? This information available to do this is currently really hard to find.

bkase commented 3 years ago

TODO: Add details around our temporary google cloud storage writing solution and start a discussion around the tradeoffs between keeping that and doing MongoDB as specified

psteckler commented 3 years ago

The block logging format has much more detail than the output of the missing subchains tool. So there isn't really commonality to exploit, for the purposes of getting block information into an archive database.

We'll need separate mechanisms to ingest that information from these sources into an archive db.

/ping @bkase

psteckler commented 3 years ago

I think the phrase "belt-and-suspenders" should appear in this RFC somewhere.

yourbuddyconner commented 3 years ago

Just to weigh in here from the professional operator perspective, it is less than optimal to require two different databases (or just multiple sources of block data) to provide redundancy for the Archive process. This also introduces the added level of complexity if the archive and/or JSON Block schema is ever mutated down the line and multiple migrations (one for PSQL, one for Mongo) have to be managed.

I really liked what I was reading from Deepthi here, and was hoping to see more development on making the Archive process more resilient to downtime as opposed to adding ways to reconcile the data later when you inevitably have downtime.

Either way, good RFC describing the tradeoffs @figitaki.

p-shahi commented 2 years ago

Most of these items have been addressed in recent tooling

MinaProtocol / mina

Archive Resilience #7111