cloudfoundry-community / consul-boshrelease

BOSH release for consul
Other
14 stars 20 forks source link

Error starting agent: agent: timeout starting DNS servers #49

Open bellaned opened 6 years ago

bellaned commented 6 years ago

Hello,

I've been trying to deploy consul-boshrelease in my local BOSH environment. However, when I issue the command bosh vms I see the following result:

Task 135. Done

Deployment 'consul'

Instance                                     Process State  AZ  IPs          VM CID                                  VM Type  Active
consul/7432a2b8-cd63-4ad2-a7f2-8dd328c54b5a  running        z3  10.244.0.32  c-c8e17a7c-707c-408a-64ea-4fe548ab2846  default  true
consul/bd8fcd0f-65df-46a2-b9e4-f96b4c695a42  failing        z1  10.244.0.25  c-3f9f1e0c-c040-41eb-4c21-86c4a2049ff7  default  true
consul/c5511d3c-b8b1-4f14-b842-c9dc0b1868e0  running        z2  10.244.0.31  c-bb41ce30-36d0-48aa-425d-b8fae8f9ed98  default  true

When I log to the failing machine, I see the following repeating entries for the consul job:

==> WARNING: Expect Mode enabled, expecting 3 servers
==> Starting Consul agent...
==> Error starting agent: agent: timeout starting DNS servers
==> WARNING: Expect Mode enabled, expecting 3 servers
==> Starting Consul agent...
==> Error starting agent: agent: timeout starting DNS servers
==> WARNING: Expect Mode enabled, expecting 3 servers
==> Starting Consul agent...
==> Error starting agent: agent: timeout starting DNS servers

I was not able to figure out what the problem is, so I searched for the error message in the consul project and I found the message here:

https://github.com/hashicorp/consul/blob/e305443db4ba8295510faf2402e584650efeb3f8/agent/agent.go#L478

    // wait for servers to be up
    timeout := time.After(time.Second)
    for range a.config.DNSAddrs {
        select {
        case addr := <-notif:
            a.logger.Printf("[INFO] agent: Started DNS server %s (%s)", addr.String(), addr.Network())
            continue
        case <-timeout:
            return fmt.Errorf("agent: timeout starting DNS servers")
        }
}

Unfortunately, I couldn't figure out the root cause of the issue. Could you please guide me and suggest what's the probable root cause? I guess it is a configuration issue or perhaps environmental issue but at this point I don't have enough information to narrow it down. That's why I need your help.

Thank you in advance for your kind cooperation!

Regards, Beloslava

ivandavidov commented 6 years ago

@drnic - do you have any suggestions on this matter?

scottillogical commented 5 years ago

99% sure this is because you have BOSH DNS enabled, it generates that error "timeout starting DNS servers" when you deploy this with bosh dns

This bosh release binds to 0.0.0.0 on port 53, which conflicts with BOSH DNS

https://github.com/cloudfoundry-incubator/ (which cf-deployment uses) binds to the external IP of the node instead of 0.0.0.0 so it is compatible

I had to make the following changes to make this bosh release compatible with BOSH dns https://github.com/cloudfoundry-community/consul-boshrelease/compare/master...scottschulthess:use-dynamic-ip-on-23?expand=1 (caveat this is a fork on release 23 of this release, not the latest as I had some compatibility issues which I didn't fix because we are replacing consul with BOSH dns for the CF infra use case) apologies for the PR being rough but I didn't intend to set it up to be mergeable as we are ditching for BOSH DNS but it should not be very difficult to do so

bonzofenix commented 5 years ago

@scottillogical Thanks for your help on this issue, we are experiencing this problem too. Is there any reason why this PR did not got merge or submitted ? https://github.com/cloudfoundry-community/consul-boshrelease/compare/master...scottillogical:use-dynamic-ip-on-23?expand=1

scottillogical commented 5 years ago

@bonzofenix we don't use consul anymore, if your interested feel free to base a pr off my branch or the ideas in my branch, I'm not sure if it's ready to merge as-is or it needs further work