camunda / camunda

Process Orchestration Framework
https://camunda.com/platform/
3.14k stars 573 forks source link

Allow to cancel bannend instances #12772

Open Zelldon opened 1 year ago

Zelldon commented 1 year ago

Is your feature request related to a problem? Please describe.

We have seen quite often now that banning of process instances can happen, even if we think it should be a rare case. If it happens right now it will trash our state, we are currently not allowing to clean up the state.

Banning an instance can happen in multiple cases, one prominent and recent example was that we had a lot of element instances and cancelation failed. This caused the process instance to be banned, meaning all the 100K of element instances are still part of the runtime state.

Describe the solution you'd like

We fixed the cancelation, via allowing to cancel in batches #11355, we should consider allowing the cancelation of a banned instance in order to clean up the state. Even if it fails no harm is done, the command will be rejected.

Describe alternatives you've considered

There is no alternative, which is the problem.

Additional context

We considering to replace banned instances https://github.com/camunda/zeebe/issues/5121, but this is a long term goal, we should fix the current situation.

Zelldon commented 1 year ago

BTW this would allows us also to avoid the step of mimik canceling in operate, which is sometimes necessary to make the banning more visible in operate.

koevskinikola commented 1 year ago

ZPA triage:

Zelldon commented 1 year ago

I'm wondering why PM needs to decide or prioritize that, if engineering sees a need for certain topics (which clearly fall in our space) let's just do it and prioritize it.

Maybe I'm missing something (sorry if so) but I think we should still be able to decide on our own what is necessary to build a great, stable, and reliable product. We can do the discussion also offline.

aleksander-dytko commented 1 year ago

We've prioritized the Replace Zeebe Instance Banning concept with regular incident handling for 8.4 to look at this holistically.

@camunda/zeebe-process-automation:

CC: @abbasadel

korthout commented 1 year ago

How would you estimate the effort to complete this?

Hard to say without a breakdown. I don't think anyone has dedicated time to this yet. It is not trivial, but not very large.

Do we know how much time is spent (roughly) each time to cancel banned instances?

We have no way to cancel banned instances at this time. That is what this feature request wants to change. This means that banned instances always take up space in Zeebe and are visible in Operate, which leads to confused users.

The banned instance can be removed from Operate with a special script, but you'll need to ask them how much time that takes.

Would this be useful for the complete replacement, or this would be thrown away?

If we can completely eradicate the banned instance concept then this feature would be thrown away. If we can't fully replace instance banning with incidents, then this feature will continue to be useful.

abbasadel commented 12 months ago

Hi @Zelldon ,

We had another look, and we now have an agreement with the PM that such technical features would be assessed and prioritized by the engineering team as long as the effort does not exceed two weeks of work (X-Large)

We will keep this in the backlog for now and revisit it once we have more capacity.