When trying to deploy our project with a gems upgrade that included upgrading health_check from v3.0.0 to v3.1.0, our new instances failed the health check with the message "Cache is returning garbage. ". As a result, our deploy failed.
Digging in to why, it seems that e3f5f1428100ecd02951d5c4a110cec8a9dcf134 changed the value written to the health check cache. Before, the health chekc:
writes "ok"
checks that read value is "ok"
Now, it:
writes "ok #{Time.now.to_i}"
checks that read value matches /^ok (\d+)$/
The problem is that these are incompatible and both the old and new code use the same check. When doing a rolling deploy, if a load balancer is constantly hitting the health check, old and new code is going to conflict with each other. That is, in our case, machines with the old health_check code was writing "ok" and the new health_check code thought that was "garbage".
We fixed this simply by temporarily disabling the cache check, deploying the new gem version, then re-enabling the cache check in the next deploy. But I think if the new code just used a different cache key (something other than __health_check_cache_test__), this would be unnecessary. Alternatively, the key could be made instance-dependent somehow (although the implementation might vary by application).
When trying to deploy our project with a gems upgrade that included upgrading health_check from v3.0.0 to v3.1.0, our new instances failed the health check with the message "Cache is returning garbage. ". As a result, our deploy failed.
Digging in to why, it seems that e3f5f1428100ecd02951d5c4a110cec8a9dcf134 changed the value written to the health check cache. Before, the health chekc:
The problem is that these are incompatible and both the old and new code use the same check. When doing a rolling deploy, if a load balancer is constantly hitting the health check, old and new code is going to conflict with each other. That is, in our case, machines with the old health_check code was writing "ok" and the new health_check code thought that was "garbage".
We fixed this simply by temporarily disabling the cache check, deploying the new gem version, then re-enabling the cache check in the next deploy. But I think if the new code just used a different cache key (something other than
__health_check_cache_test__
), this would be unnecessary. Alternatively, the key could be made instance-dependent somehow (although the implementation might vary by application).