cloudfoundry-attic / consul-release

This is a BOSH release for consul.
Apache License 2.0
10 stars 30 forks source link

Consul results are not cached on windows #83

Closed jvshahid closed 6 years ago

jvshahid commented 6 years ago

Context

Historically we have disabled the windows Dnscache service because it was caching negative results (i.e. failure to resolve a name) which caused long outages if windows tried to query consul agent while it was down (e.g. while consul agent is restarting). We later realized that the consul agent isn't very stable and would sometimes fail to resolve names (which is something the Diego team is experiencing). @mhoran mentioned that the current solution is to enable the Dnscache service. Set MaxNegativeCacheTtl to 0 in order to prevent caching of failures and rely on Dnscache service to cache entries so that we don't have to rely on the consul agent being very reliable

I am not sure if the caching of .service.cf.internal entries has been verified or if that's a recent regression. I am inclined to say it is the former since consul docs indicates that the default TTL is set to 0 to prevent caching. You can also check the ttl of the record returned from consul by running

nslookup.exe -q=mx bbs.service.cf.internal

Expected bevavior

when i run

ping bbs.service.cf.internal

I expect the output of ipconfig /displaydns to include a record for bbs.service.cf.internal to indicate that the entry is cached

Actual behavior

I do not see any .cf.internal entries in the dns cache

cf-gitbot commented 6 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/153368219

The labels on this github issue will be updated when the story is started.

jvshahid commented 6 years ago

/cc @sunjayBhatia @aminjam

evanfarrar commented 6 years ago

https://github.com/cloudfoundry-incubator/consul-release/releases/tag/v191 contains a property for this, I could consider that it closes your issue, but I also want your feedback on what you find to be a sane default value for that property (i.e. what makes WATS pass reliably) so I will leave open for now.

genevieve commented 6 years ago

Related thread: https://cloudfoundry.slack.com/archives/C02FM2BPE/p1513711925000203?thread_ts=1513227292.000131&cid=C02FM2BPE

genevieve commented 6 years ago

Made a PR for the windows-cell ops-file ^ and the value can be changed based on WATS needs.