Investigation: can bulk import run on non-clean db?

Pitasi commented 2 years ago

We are currently running the bulk import on the assumption that the database is clean: e.g. if we bulk import osmosis we don't expect to see any osmosis row in any of the tables (balances, auth, ...).

We want to investigate IF it would be possible to launch bulk import on a non-clean database, while other nodes are running (i.e. upserting the same rows for that chain).

Scenario 1 (simple)

Let's imagine an address abc that had 123 osmo at block height 12, then received 500 osmo at block height 55.

Previous Cockroach DB state:	chain_name	address	amount	block_height
osmosis	abc	123uosmo	12

We launch a bulk import on a node that was on block height 100.

Expected Cockroach DB state:	chain_name	address	amount	block_height
osmosis	abc	623uosmo	100

Note how we wrote 100 instead of 55 in the block_height column. There's no way to simple retrieve that information during bulk import so for simplicity we just use the height of the bulk import. This is ok as that balance is valid for block 100 too.

Scenario 2 (simple)

Same as scenario 1: Let's imagine an address abc that had 123 osmo at block height 12, then received 500 osmo at block height 55.

Previous Cockroach DB state:	chain_name	address	amount	block_height
osmosis	abc	99999uosmo	5824

We launch a bulk import on a node that was on block height 100. For some reason our db was more updated than the bulk import, so we expect this row to stay the same.

Expected Cockroach DB state:	chain_name	address	amount	block_height
osmosis	abc	99999uosmo	5824

Scenario 3 (hard)

Address abc delegated to a validator 123usmo at block height 12. That delegation is deleted at block height 55 (assume there's no unbonding time for simplicity).

Previous Cockroach DB state:	chain_name	address	amount	block_height	delete_height
osmosis	abc	123uosmo	12	NULL

We launch a bulk import on a node that was on block height 100.

Expected Cockroach DB state (any of these would work):

delete_height is the correct block height

chain_name address amount block_height delete_height

osmosis abc 123uosmo 12 55
delete_height is the bulk import height

chain_name address amount block_height delete_height

osmosis abc 123uosmo 12 100
the row is fully deleted (pruned)

chain_name address amount block_height delete_height

chain_name	address	amount	block_height	delete_height
osmosis	abc	123uosmo	12	55

chain_name	address	amount	block_height	delete_height
osmosis	abc	123uosmo	12	100

Pitasi commented 2 years ago

I marked scenarios 1 and 2 as simple because we already have the "upsert" SQL queries that update the rows only if the height column in the database is lower then the height we want to write (note: this version has not been released yet).

I marked scenario 3 as hard because the bulk import only "sees" the current state of the cosmos sdk database (application.db). Therefore, bulk import doesn't see any delegations at all at block height 100, we would need a way to match the "extra data" we have on the database and delete/soft delete that row.

tbruyelle commented 2 years ago

@Pitasi Thank you, your issue helps me a lot to better understand the problems that are supposed to be solved by the TL.

Some questions :

TL listens to ABCI using a webocket if I'm not wrong, so how do you extract the bulk import ?
What's a node exactly ? I though it was a TL instance dedicated to a blockchain, but I have a doubt after reading your scenarii since a node can have a different block-height than cockroachdb...
what does that mean when you say the node is on block XXXX.
About scenario 3, so if I understood correctly there's no way to detect a delete in the cosmos sdk db, other than comparing the state with the one we have in cockroachdb ? It would be so much easier if the cosmos sdk db exposes events like an Apache Kafka :).

Pitasi commented 2 years ago

We probably are going to need a proper tracelistener lesson and also take the chance to improve our Notion pages 😛

Cosmos SDK writes "traces" to a file, it is supposed to be a debug feature. Each trace represents an operation to the underlying Cosmos SDK database, we discard "read" traces and only keep "write" and "delete" traces.

Keep in mind that Cosmos SDK storage is simply a key-value storage.

Our goal is basically try to replicate the Cosmos SDK db into our CockroachDB instance.

That said, let me answer your questions, but we can schedule a meet for tomorrow:

TL listens to ABCI using a webocket if I'm not wrong, so how do you extract the bulk import ?

Forget ABCI, we are at a much higher level here :) (ABCI is about Tendermint Core, the consensus protocol that executes a state machine, but doesn't say how you store data or what operations you can perform on this data)

Traces are operations on db made by Cosmos SDK modules. Bulk import basically reads the database from files and internally generates "fake write traces" so we can reuse the same Tracelistener code.

What's a node exactly ? I though it was a TL instance dedicated to a blockchain, but I have a doubt after reading your scenarii since a node can have a different block-height than cockroachdb...

For me a node is an instance of a node of a blockchain. For instance in production we have 4 osmosis nodes, each of them have a Tracelistener instance running. Important note: they all share the same database and concurrently compete to update the same rows.

It happens that some of the nodes crashes, or stop syncing for a while (think for example of a temporary network failures to one of our kubernetes nodes). After this node come back, it will begin download all the data it missed in this amount of time, and emits the same traces its siblings already emitted (hence cockroachdb was already updated).

what does that mean when you say the node is on block XXXX.

I mean we stopped the blockchain node when it was at that height. So it's internal database represents the state at height XXXX.

About scenario 3, so if I understood correctly there's no way to detect a delete in the cosmos sdk db, other than comparing the state with the one we have in cockroachdb ? It would be so much easier if the cosmos sdk db exposes events like an Apache Kafka :).

We can detect "delete" traces, they are a delete event, but we don't want to process historical events like we could in Kafka (we can for example spin up a new blockchain node and let it sync from scratch, this will emit all events again, but it's a painfully slow process since there are millions of blocks and each block emits many events).

What we want is to be able to "jump" to block height XXX without parsing all previous blocks, hope that makes sense.

tbruyelle commented 2 years ago

Thanks a lot @Pitasi , yes I probably need more information, I won't say no to a meet on that topic :)

DeshErBojhaa commented 2 years ago

@Pitasi let's arrange a team wide call/knowledge sharing session on TL and SDK DB.

Pitasi commented 2 years ago

I think we discussed about possibilities and arrived to an algorithm that works so I opened a ticket to track the implementation of it: https://app.zenhub.com/workspaces/emeris-back-end-619cd009cc778000112d6259/issues/emerishq/demeris-backend/784.

I'll close this issue, thank you!

EmerisHQ / demeris-backend