camunda / issues

4 stars 0 forks source link

Improve availability of Zeebe by reducing impact of snapshot integrity checks #728

Open engineering-issue-sync-app[bot] opened 3 months ago

engineering-issue-sync-app[bot] commented 3 months ago

Value Proposition Statement

Improve availability of Zeebe clusters with so-called large state (>2GB) by reducing the impact of snapshot integrity checks, speeding up all snapshot operations while maintaining high data durability.

User Problem

Currently, users with large state see large I/O spikes whenever a snapshot is taken, as well as startup time penalties incurred due to a slow and expensive integrity check of the received snapshot.

By reducing the impact of these checks - while maintaining the durability guarantees they offer - we can improve startup and catch up time (thus improving availability), as well as reduce I/O contention, which would improve overall system performance.

User Stories

Implementation Notes

Most of it is largely implemented in Zeebe already, but we're waiting on upstream patches to RocksDB before continuing. There isn't really anything to do until then, as it would be a lot of effort for us to fork RocksDB, apply C/C++ patches, maintain the fork up to the date, etc. It would only be worth doing so if we plan on doing more patches in the future, which is rather unclear at this point. After using it for a couple years, we've only done a handful of patches, so it's probably fine.

:robot: This issue is automatically synced from: source

garima-camunda commented 2 months ago

@npepinpe a quick question - I noticed that this issue does not contain the "support" label and issue description does not contain reference to the support ticket https://jira.camunda.com/browse/SUPPORT-21546. I was wondering if this would cause any issue with the release notification process (for example: https://camunda.slack.com/archives/CHAC0L80M/p1712313090837239)

npepinpe commented 2 months ago

Hm, good question. I'm not sure the issues from this repo are parsed by the release process anyway, let me check.

npepinpe commented 2 months ago

Issues here won't be picked up by the release process, so I'll open an issue on the Zeebe side specifically to track this.

garima-camunda commented 2 months ago

Thanks @npepinpe for checking and creating Zeebe issue!

renzpatriarca commented 2 weeks ago

Hi @npepinpe , I think issue in Zeebe repo is closed now - https://github.com/camunda/camunda/issues/17920 so this issue can also be closed?

npepinpe commented 2 weeks ago

I think from the PDP side this feature is still in the validate phase. I'm not sure what the process is - considering we synchronize this issue from the product hub one, I suppose it will get closed when that one is closed by the PM who's performing validation.