perf: Parallelize saveNewNodes' DB writes with figuring out "what to write"

ValarDragon commented 4 months ago

This PR parallelizes to SaveNodes, and figuring out what it is we have to write. ("saveNewNodes").

We can improve this in the future to process eveything but "Set" in another goroutine, and then keep a buffered queue for "Set" that completes asynchronously.

If this is not useful with the IAVL v2 work, can I just put this in the IAVL v1 line?

This PR as is feels like a pretty straightforward improvement, that should give a 7% sync improvement on Osmosis for IAVL v1 today. I don't think theres any tests to add here, I don't see any edge case here thats not covered by existing tests.

Benchmark for 2000 blocks on IAVL v1 on osmosis mainnet for context:

This PR will drop the latency of this from 42 seconds to 24 seconds. However

We should be able to with subsequent work:

(No async commit) Parallelize latency for this function time to the longest of:
- num nodes to hash * time to sha256 / num cores
- DB writing all new nodes
And async commit removes the DB writing part

Better parallelism would make it:

No async commit be max(4 seconds, 18 seconds / num cores)
Async commit (18 seconds / num cores)

coderabbitai[bot] commented 4 months ago

Walkthrough

The update introduces a significant enhancement to the MutableTree structure, specifically within its saveNewNodes method. By leveraging goroutines, the process of saving nodes now occurs concurrently, leading to improved performance. This approach is complemented by the use of channels for effective communication and error management. Additionally, the update ensures that nodes are properly detached and their keys are recursively assigned, optimizing the process for parallel execution and enhancing overall efficiency and error handling.

Changes

File(s)	Summary of Changes
`mutable_tree.go`	Introduced goroutines in `saveNewNodes` for parallel node saving, with channels for communication and error handling. Optimized for parallelization, including recursive key assignment and improved efficiency.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

Tips

### Chat There are 3 ways to chat with CodeRabbit: > Note: Auto-reply has been disabled for this repository by the repository owner. The CodeRabbit bot will not respond to your comments unless it is explicitly tagged. - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit tests for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository from git and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit tests.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger a review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - The JSON schema for the configuration file is available [here](https://coderabbit.ai/integrations/coderabbit-overrides.v2.json). - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json` ### CodeRabbit Discord Community Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback.

ValarDragon commented 4 months ago

Note that this code preserves functionality, as the recursive loop just builds a list of newNodes, and then we just one-by-one serially call SaveNode on it. So we still have the serial SaveNode behavior

ValarDragon commented 4 months ago

We've tested this gave a speedup on IAVL v1 on Osmosis!

kocubinski commented 4 months ago

I love this conceptually, write nodes to disk as the tree is hashed and node keys are generated in parallel.

I guess one draw back is failing partway through tree traversal - those nodes are now possibly orphaned.

cosmos / iavl

perf: Parallelize saveNewNodes' DB writes with figuring out "what to write" #889

Walkthrough

Changes