When the log-cache pod goes down, the gRPC endpoint used by CC goes down; it appears that the default c-ares DNS resolver used does not correctly recover when DNS successfully resolves again. Switching to using the native DNS resolver instead appears to cause it to do proper exponential backoff instead, meaning that it will eventually recover.
Motivation and Context
This fixes issues where we become unable to restart applications after the log-cache role goes down; see #1547 for details.
How Has This Been Tested?
Started minikube
Pushed an app (to ensure things are working)
Scaled log-cache to 0 instances
Restart the app (see issues about Statsd unavailable; this is expected)
Scaled log-cache back up to 1 instance
Wait a bit
Restart the app, things work.
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[ ] My code has security implications.
[x] My code follows the code style of this project.
[ ] My change requires a change to the documentation.
Description
When the log-cache pod goes down, the gRPC endpoint used by CC goes down; it appears that the default c-ares DNS resolver used does not correctly recover when DNS successfully resolves again. Switching to using the native DNS resolver instead appears to cause it to do proper exponential backoff instead, meaning that it will eventually recover.
Motivation and Context
This fixes issues where we become unable to restart applications after the log-cache role goes down; see #1547 for details.
How Has This Been Tested?
log-cache
to 0 instanceslog-cache
back up to 1 instanceTypes of changes
Checklist: