Closed Madhura-08 closed 2 years ago
@Madhura-08, following rules are exercised in Hare commit messages,
CORTX-33787: [v0.9.0][2.0.0-880] Kafka errors UNKNOWN_TOPIC_OR_PART during build deployment
- re-raise the exception in order to properly propagate the script return code
to caller
Signed-off-by: Madhura Mande <madhura.mande@seagate.com>
Please consider a following commit message format,
CORTX-33787: ha deployment fails due to kafka errors
<Describe the problem, e.g. unknown topic exception not handled>
Solution:
<Describe the solution, mainly how solution fixes the problem>
Signed-off-by: Madhura Mande <madhura.mande@seagate.com>
…uring build deployment
Problem: HA mini provisioning is failing because of Kafka connection. It is happening because Kafka and HA pod is getting deployed simultaneously. For that, HA needs to try to reconnect/retry. But that is not happening. Hence Kafka topics consul keys and other keys are not getting created. So, HA POD is running but not functional.
Solution: To reconnect/retry, the init container needs to be restarted because mini provisioning gets executed as part of the init container. For the init container to restart, a proper failure code must be returned to the caller. For here, the exception needs to be re-raised to the caller and there, already the error code returning is handled.
Signed-off-by: Madhura Mande madhura.mande@seagate.com
Problem Statement
https://jts.seagate.com/browse/CORTX-33787
Design
HA and third party kafka pods now gets deployed simultaneously. HA connects to kafka at its init stage(mini provisioning) for creating topics. As HA tries to connect to kafka, but that time, kafka was running but it is not ready to serve. Hence HA fails at mini provisioning stage and fails to create consul keys. For this init container needs to be restarted so that kafka connection retries will be executed. Ideally init container is meant to be executed only once. It will be restarted only if some failure occurs. And failure can be propagated in the form of return code. From HA side, the exception was not getting re-raised and that is why return code was always sent as 0 which was not causing the restart of init container. So, re-raising the exception and proper return code handling is needed.
Coding
Testing
Review Checklist
Review Checklist
Documentation
Checklist for Author