Since the refactor to use the exception manager, tasks that were acquired, but not processed because the runningTaskProcessor did not finish executing the current task in the allotted time were not release in the ArmoniK sense.
The message from the queue was put back into the queue, but the task itself remained in the dispatched state (acquired by the current agent).
This had two implications: such a task would need more work to be re-acquired by another pod by using the message duplication algorithm, and the timeout was considered like an actual error of the agent, and would make the agent unhealthy after a few acquire timeouts.
Description
This PR adds a proper catch for the timeout, and release the task in the catch.
Testing
A new test has been added to ensure that the pollster does not produce any error when the timeout occurs, and that the task is actually released properly.
Impact
This should help with long running tasks and avoid agent restarts.
It should also help improve the performance of the orchestration on long running tasks.
Additional Information
NA
Checklist
[X] My code adheres to the coding and style guidelines of the project.
[X] I have performed a self-review of my code.
[ ] I have commented my code, particularly in hard-to-understand areas.
[ ] I have made corresponding changes to the documentation.
[X] I have thoroughly tested my modifications and added tests when necessary.
[ ] Tests pass locally and in the CI.
[ ] I have assessed the performance impact of my modifications.
Motivation
Since the refactor to use the exception manager, tasks that were acquired, but not processed because the runningTaskProcessor did not finish executing the current task in the allotted time were not release in the ArmoniK sense. The message from the queue was put back into the queue, but the task itself remained in the dispatched state (acquired by the current agent).
This had two implications: such a task would need more work to be re-acquired by another pod by using the message duplication algorithm, and the timeout was considered like an actual error of the agent, and would make the agent unhealthy after a few acquire timeouts.
Description
This PR adds a proper catch for the timeout, and release the task in the catch.
Testing
A new test has been added to ensure that the pollster does not produce any error when the timeout occurs, and that the task is actually released properly.
Impact
This should help with long running tasks and avoid agent restarts. It should also help improve the performance of the orchestration on long running tasks.
Additional Information
NA
Checklist