Tracelistener is a program that reads the Cosmos SDK store
in real-time and dumps the result in a relational database, essentially creating a 1:1 copy of the data available in a module's prefix store.
The relational database of choice is CockroachDB — a Postgres protocol-compatible relational database — while the entirety of tracelistener is written in Go.
Tracelistener is a vital component of the Emeris backend, since it provides
without us having to query those information from full-nodes.
By not querying full-nodes, tracelistener reduces nodes load and diminishes the chance of load-related issues — like nodes not receiving/parsing blocks due to high query amount.
Given the tightly-coupled nature of tracelistener to a Cosmos SDK node, they must be executed together on the same machine.
Refer to the dedicated page.
The Cosmos SDK has a little-known feature called store tracing, which tracks each and every store operation on a file.
Cosmos SDK store defines four kind of store operations:
write
delete
read
iterRange
Tracelistener is only concerned with the first two.
Each store operation is divided by a newline, and the store operation itself is serialized as JSON.
To reduce hard drive load on the hardware node which is running, tracelistener opens a UNIX named pipe (commonly referred to as FIFO) on which the Cosmos SDK node will then write store tracing lines.
First of all launch gaia or any Cosmos SDK based chain node passing a FIFO as the --trace-store value:
# This example needs to be executed in two separate terminals.
# Terminal 1
mkfifo /tmp/tracelistener.fifo
gaiad start --trace-store /tmp/tracelistener.fifo
# Terminal 2
cat /tmp/tracelistener.fifo
In the first terminal we create a named pipe in /tmp/tracelistener.fifo
, and then start gaiad
with the --trace-store
.
gaiad
will look like it's stuck on the Tendermint initialization phase: it's normal, FIFO's block writes until there's a reader.
In the second Terminal the cat
command starts printing JSON store tracing lines, and gaiad
will unblock itself and resume execution.
If cat
is killed before gaiad
, the latter will experience a consensus failure: this is normal, and happens because it is not possible for a program to write on a closed pipe.
In a production environment, tracelistener must always be executed before the SDK node, and killed last.
For each JSON line read, tracelistener unmarshals it into a Go struct and proceeds with the parsing routine — we will refer to this object as trace operation from now on.
{
"operation": "write",
"key": "AWWI/l6u6S5Zhb6vAZgj4emcTZJz",
"value": "CiAvY29zbW9zLmF...RjAtwEgutgE",
"metadata": {
"blockHeight": 4686332,
"txHash": "D74A356B73A111E4977619EA22F5597F44F49B15CB5177B59846CC70744A0B4B"
}
}
An incoming trace looks as above when read from the incoming Unix pipe.
Each processor contains modules, which are entities capable of
Value
INSERT
statement to be executedRight now there's only one processor, called *gaia.*
To understand where to route each trace operation, processors look at the prefix bytes on each operation Key
.
Each module is responsible of validating a trace operation against a well-defined set of rules, because Key
prefixes could be shared among different Cosmos SDK modules — for example, the 0x02
prefix is used by the IBC channels module as well as the supply
one, so the IBC channels module must be sure to not write supply
database rows in its table.
Once a trace operation has been processed, it is batched and kept on hold until the next block arrives. This means we wait to run database queries until we receive one trace of the next block. We do this because it’s possible to receive multiple traces concerning the same row, and we want to commit to db only the final state.
Database schema is automatically migrated each time tracelistener is executed, but this behavior will change in the future.
An attempt to represent the tracing mechanism as a domain model could look as above.
IAVL is an abstraction layer above a key-value store, allowing taking snapshots of the underlying data, stored in LevelDB.
It is important to note here that the underlying DB only stores the state change information:
It does not store trx, block or execution sequence information.
When we are performing a bulk import we are reading directly from LevelDB, i.e. the latest IAVL snapshot.
The information that we load from a bulk-import is different (less) to what we receive from incoming traces.
LevelDB is missing
The mapping between modules (i.e. IAVL tables) and Tracelistener processors is as follows
bank
ibc_channels
, ibc_clients
, ibc_connections
validators
, unbonding_delegations
delegations
ibc_denom_traces
auth
Each Cosmos module is internally state machine, storing its current state in IAVL. A trx is causing side-effects and these update the internal state.
Conceptually the different modules can be split in different categories by the “type” of internal state they maintain.
E.g. the bank module.
Here we are simply updating a value, like setting the balance from 10 to 20.
The incoming trace event only contains the new value.
E.g. validator set.
Here we are inserting and deleting entries in a set of entries.
I.e.
INSERT INTO XXX (...)
DELETE FROM XXX WHERE ...