cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.27k stars 3.63k forks source link

[Feature]: Improve Optimistic Execution concurrency #17584

Open facundomedica opened 1 year ago

facundomedica commented 1 year ago

Summary

Right now the concurrency model in Optimistic Execution is basic and not elegant at all. It uses a mutex to lock any access to some variables and when aborting a process we need to wait until the transaction that is being processed finishes (this could be a problem in chains with txs or being/end blockers that take a long time to execute).

Problem Definition

Baseapp is not prepared for this kind of data access.

Proposed Feature

  1. I propose we first look into separating concerns between FinalizeBlock and BaseApp. This way we make sure FinalizeBlock only uses whatever params are being passed to it without touching baseapp.
  2. Once we get the above sorted out, we can start passing a cached context to the function instead of it using a context stored inside baseapp (finalizeBlockState).

With 1 and 2 completed, we can improve the concurrency model of Optimistic Execution with some of these ideas:

Bonus points if we add context cancellation handling inside all of the functions that perform possibly long-running processes (like iterations and such). This would make OE super efficient as most txs could be aborted mid-way.

alexanderbez commented 1 year ago

Totally in favor of using channels instead of a mutex!

A more advanced goroutine handler that could allow us to start a new one without having to wait the previous running (aborted) goroutine.

I'm a bit dubious on this -- I'm concerned you'll need to wait for the current tx processing goroutine to exit to prevent undefined concurrent access to tx execution primitives (e.g. iterators, CRUD ops on storage, etc...).

As for the first two points, that's gonna be a challenge, but with a runtime rewrite, it should be possible I imagine?

facundomedica commented 1 year ago

I'm a bit dubious on this -- I'm concerned you'll need to wait for the current tx processing goroutine to exit to prevent undefined concurrent access to tx execution primitives (e.g. iterators, CRUD ops on storage, etc...).

The way I'm seeing it right now is that we should be able to pass copies of everything, so there's no sharing. But yeah, not sure if it's possible