Open Zelldon opened 3 years ago
With https://github.com/zeebe-io/zeebe/pull/6444 we write an empty deployment record on FULLY_DISTRIBUTED
to reduce the duplication of information (and not reach the limits so easily). Currently this record is not consumed (only the key by the applier), which makes this possible. If we see a need for more information we can revise/change that.
\cc @npepinpe I would like that you're aware of that
I can just say it again if we want to fix this, we probably want to do it before 1.0 @npepinpe @saig0
We will postpone this for now:
Guess the wrong issue was referenced :smile:
Operate issue https://jira.camunda.com/browse/OPE-1285
With #6921 we removed the duplicated resources from the deployed processes, which were part of the DeploymentRecord. Now the record contains only the metadata of the processes. Still we write the resources for the Deployment CREATED follow up event.
As written here https://github.com/camunda-cloud/zeebe/pull/6921#issuecomment-831092725 it was not simple / easy to remove this, since we relay on the content. We use the resources on redistributing, which we could avoid if we would distribute the ProcessRecord for example. But this might be also not ideal since it make the distribution a bit more complex and fragile, Example: Why one process of that deployment is already distributed but the other not, Why I can already create instances for X not for Y etc.
In order to resolve this current issue, we need to find a way how we could reduce further the written content and reduce the duplicated data.
I think it is up to you @npepinpe whether we want to tackle this and when. I will remove my assignment for now.
Hint: WorkflowRecord
was renamed to ProcesssRecord
Raised in context of larger DMN deployments done by a customer here https://camunda.slack.com/archives/C037W9NMATG/p1721311530038739
And support case https://jira.camunda.com/browse/SUPPORT-22559
Input from @remcowesterhoud on the Slack thread
The specific problem described in the issue is actually fixed already. It mentions putting the resources in the Deployment.CREATED event, which we no longer do. We put it in the CREATED event for the resource, which we definitely need to store it in the state. It's also on the distribute command, which we also need as the other partition require the resource to store it. [..] Maybe some intermediary command before distribution could solve it. We could take the resources out of it and just reference it by key. The processor of that command could then fetch the resource and distribute it to the other partitions
ZPA Triage:
@camunda/zeebe-distributed-platform do you still see a technical limitation to have this inside limit?
Well, you are sending that response out technically, so it would prevent from responding I suppose :thinking:
Otherwise it's a performance issue, and possibly memory issue. If you keep the processing unbounded, since it's entirely in memory, you would be more likely to run out of memory. I think the real solution is to support chunked execution or some form of continuation.
you are sending that response out technically
That's not correct. We're only responding a single event record back, but the max message size limits all records in the record batch.
But a single record may still be > 4MB no?
But a single record may still be > 4MB no?
Sure, but the common cases are not related to surpassing the 4MB on a single record but on the combination of all follow-up records. That includes the case above here:
Same goes for creating a process instance with almost 4MB of variables:
EDIT: as we move to REST APIs, the response size is also no longer a limitation
EDIT: as we move to REST APIs, the response size is also no longer a limitation
Arguably right now we still need to buffer them all in memory, so I would suspect larger responses could lead to OOM at times if we can't flush them out quickly enough :shrug:
At any rate, we can argue about the responses, but it remains that having unbounded records/batches will have an impact on performance, since right now we're not set up to chunk or split them, and instead we have to buffer everything in memory. So I would push back on having simply no limit at all, as I can guarantee someone will run into OOMs. There are some proposed fixes , though maybe there are simpler solutions I haven't considered.
ZPA triage:
Description
With https://github.com/zeebe-io/zeebe/pull/6289 we introduced a new record called WorkflowRecord, which represents the workflow (with resource, checksum, key etc) on the stream. We are writing this workflow event as follow up event on the deployment partition, when we process the CREATE Deployment command. The issue here is that we currently also write the CREATED Deployment as follow up event, which also contains all the workflows and resources. This means we can reach the maxMessageSize much easier now with big deployments. In order to still support bigger deployments we should find a way to mitigate that, either by not writing the workflows/resources as part of the Deployment event (this might break operate) or something else.
This problem will become more prominent if we reduce the max message size.
Related support cases: https://jira.camunda.com/browse/SUPPORT-22559