Closed jhovell closed 8 years ago
Why are you trying to connect to 8300? The HTTP api is on 8500 by default if I remember correctly.
@spencergibb I am not specifying 8300 anywhere in my code. I do notice that the service "consul" resolves to :8300 which is something that the consul server nodes broadcast. I am guessing that is how 8300 is being resolved in Consul API. Is this incorrect or can I change this in my consul configuration?
Here is an example server node
The agent nodes register consul-agent-8500 on tcp/8500.
Also I am using Registrator so it is possible that is somehow responsible for these service names that are confusing Consul API.
I'm by no means a Consul expert so a bit lost as to where to start tracking the problem...
Yes, 8500 is default port for HTTP
@jhovell
Try to run
curl http://172.31.2.106:8500/v1/catalog/nodes
@vgv
Hide real JSON (vgv)
Thank you.
Your consul agent uses default HTTP port 8500 and you have to use it too. Please, show your code where you construct ConsulClient
@vgv I'm using the following constructor with this library:
It is initialized with the values
new ConsulConfigurationSource("172.31.3.121:8500,172.31.2.111:8500,172.31.3.216:8500","/my/root/path");
Also in case it is helpful, this is a problem I only see occasionally. Most Consul clusters I have booted work fine, but occasionally it gets into this state and my only workaround is to destroy the cluster and start again.
Here is what my Consul Agent nodes look like:
Hmm, looks a bit unusual for me. Why do you start a few consul agents on this one (172.31.3.121) server?
@vgv yeah I think this is a function of Registrator which - as best I can tell - will register every port found open on the docker container as a "service" and it post-pends port numbers to make the names unique. There is only 1 dockerized Consul agent running on 172.31.3.121, but it has all of those ports open.
I am basing my implementation on this blog post in case it is helpful for context (which includes using Registrator):
https://aws.amazon.com/blogs/compute/service-discovery-via-consul-with-amazon-ecs/
@vgv I think I see the problem. I don't know if it is an issue with archaius-consul or your library or my setup. It looks like archaius-consul tries to look up the service "consul".
Looking at Consul docs, this looks like this is correct behavior to resolve to port 8300
https://www.consul.io/docs/agent/dns.html (toward the middle-bottom dig @127.0.0.1 -p 8600 consul.service.consul SRV
)
So maybe this is a problem with the archaius-consul library that it is supposed to look up not "consul" but "consul-agent-8500"
1) why does this only happen sometimes? - my guess is that the "if" line 53 is sometimes not triggered 2) what is wrong with my consul setup such that "consul" isn't the 8500 agent nodes, but is instead the 8300 server nodes.?
I will keep investigating. Not sure if this is a problem with my Registrator setup or something else.
For reference my dig (I redirect port 8600 to 53 on my consul container):
[ec2-user@ip-172-31-3-121 ~]$ dig @127.0.0.1 -p 53 consul.service.consul. SRV
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.45.amzn1 <<>> @127.0.0.1 -p 53 consul.service.consul. SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36319
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 5
;; QUESTION SECTION:
;consul.service.consul. IN SRV
;; ANSWER SECTION:
consul.service.consul. 0 IN SRV 1 1 8300 i-0fab9fd7.node.usw2.consul.
consul.service.consul. 0 IN SRV 1 1 8300 i-699d84ae.node.usw2.consul.
consul.service.consul. 0 IN SRV 1 1 8300 i-0cab9fd4.node.usw2.consul.
;; ADDITIONAL SECTION:
i-0fab9fd7.node.usw2.consul. 0 IN A 172.31.2.106
i-699d84ae.node.usw2.consul. 0 IN A 172.31.3.65
i-0cab9fd4.node.usw2.consul. 0 IN A 172.31.2.105
i-0dab9fd5.node.usw2.consul. 0 IN A 172.31.2.104
i-6a9d84ad.node.usw2.consul. 0 IN A 172.31.3.64
;; Query time: 2 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed May 4 20:07:11 2016
;; MSG SIZE rcvd: 458
@vgv becoming more convinced this is a problem isolated to https://github.com/researchgate/archaius-consul/ ... looks like setClient() method is being re-called inside an exception block and a new client is being created on port 8300. I think what triggers this is if a node doesn't happen to be available but was still cached in DNS, such as during a rolling update of hardware. I'll take this up with the archaius-consul folks... thanks for helping me debug!
I am trying to diagnose a problem that I don't know if it is a bug in Consul API (1.1.7) or the way I am running Consul (v 0.6). I get the following stack repeatedly in my logs. I'm using the Consul API in conjunction with a Netflix Archaius wrapper but looking at the code I'm not sure it is a bug there. Seems to be an issue with the way Consul API is trying to create a catalog client.
Followed by 2 Apache http error messages:
A lot of Googling shows similar errors with connection reset on port 8300 saying that the client should be trying to connect to 8301 instead. Is this some problem in the way I have Consul configured or is this a bug in the Consul client?
Thanks!