Cloud deployments are failing during health check steps. So far what we have observed that it is due to Azure Container Instance taking longer to start up or getting restarted initially which make the curl health check call to fail.
curl is supposed to be retyring in such scenario but it doesn't retry in case of a connection is reset by peer (more in the linked issue). To overcome these two things are added in this PR:
Update Azure Container Instance liveness check probe to give container more time before it is ready for a health check.
curl request is done now with --retry-all-errors flag which retries on more error codes, so it covers the above-mentioned situation when connection is reset by peer. But in order to have this flag available for curl we need to have curl version at least 7.71.0 which is not available on ubuntu-latest runners (20.04 version), that's why we have added a shell script to upgrade curl version for the jobs where it is needed.
Further notes
Didn't found a curl version >=7.71.0 package which we can directly download and use on ubuntu, hence doing the installation from the source.
Tried to use ubuntu-22.04 runner which has latest curl version, this runner version is in beta right now from GitHub. While using this runner two times we observed terraform and curl check getting stuck in an indefinite loop. Not sure what was causing that but can because of a runner issue as it is in beta. So, to avoid confusion decided not to use it and do curl update manually with 20.04 runner.
Linked Issue(s)
Links #215
Checklist
[ ] added appropriate tests?
[ ] performed checkstyle check locally?
[ ] added/updated copyright headers?
[ ] documented public classes/methods?
[x] added/updated relevant documentation?
[ ] added relevant details to the changelog? (skip with label no-changelog)
What this PR changes/adds
Cloud deployments are failing during health check steps. So far what we have observed that it is due to Azure Container Instance taking longer to start up or getting restarted initially which make the
curl
health check call to fail.curl
is supposed to be retyring in such scenario but it doesn't retry in case of a connection is reset by peer (more in the linked issue). To overcome these two things are added in this PR:--retry-all-errors
flag which retries on more error codes, so it covers the above-mentioned situation when connection is reset by peer. But in order to have this flag available for curl we need to have curl version at least 7.71.0 which is not available on ubuntu-latest runners (20.04 version), that's why we have added a shell script to upgrade curl version for the jobs where it is needed.Further notes
Linked Issue(s)
Links #215
Checklist
no-changelog
)