decred / politeia

ISC License
110 stars 75 forks source link

tstore: Freeze trillian trees. #1670

Closed lukebp closed 1 year ago

lukebp commented 1 year ago

This commit adds episodic checks to the tstore backend that freeze any trillian trees for records that have been updated to a status that no longer allows updates, such as censored or archived, and that have a final dcr timestamp appended onto the tree.

The reason we need this requires some background knowledge on the trillian architecture.

The trillian_log_signer polls the MySQL database at a fixed interval, looking for leaves that have been queued up and are waiting to be appended onto a tree. It does this for all trees that have an ACTIVE status.

Trillian was designed to be used for a small number of trees that have infrequent writes, but that can get very large over time. The recommended log_signer_interval, i.e. the polling interval, was 2-3 seconds.

The way we use trillian is quite different from it's intended use case and you see this reflected in the performance of trillian on our servers. We set the log signer interval to 200ms because we require the leaves be appended onto a tree in order for a write to be considered valid. We also use a new tree for each record. This results in a large number of trees that get polled by the log signer every 200ms and is why the CPUs spin on our servers. Moving the status of trees that can no longer be modified to FROZEN will help reduce this load.

lukebp commented 1 year ago

This PR took a bit longer than expected because it relies on dcr timestamps and this work overlapped with a malicious asic miner draining the testnet ticket pool, bringing the testnet chain to a halt. Even once testnet was brought back up, dcrtime ran into unrelated wallet issues (wallet address gap limit was being hit) on both mainnet and testnet that took some time to debug and fix.

lukebp commented 1 year ago

This commit message contained an inaccuracy.

The recommended log_signer_interval, i.e. the polling interval, was 2-3 seconds.

The default interval is actually 100ms. The 2-3 seconds came from an example trillian setup, but is not the recommended parameter.