Closed amitgilad3 closed 8 years ago
Sorry you're having problems. You've not given me much to info to help you. 1) Which version? 2) If version 0.4.0, which backend? 3) Have you looked at the rabbitmq logs? Please provide the startup logs for both nodes.
Hi gmr, 1.rabbit version is 3.5.4 2.rabbitmq-autocluster 0.4.0 with consul backend 3.the first rabbit is registered add works fine but the second rabbit shows me the following error:
=INFO REPORT==== 9-Aug-2015::08:58:22 === autocluster: Node appears to be the first in the cluster
=ERROR REPORT==== 9-Aug-2015::08:56:37 === autocluster: HTTP Response (500) CheckID does not have associated TTL
=ERROR REPORT==== 9-Aug-2015::08:56:37 === autocluster: Error updating Consul health check: "500"
rabbitmq.config(same for every node): [{autocluster, [ {consul_host, "localhost"}, {consul_port, 8500}, {consul_service, "rabbitmq-reportService"}, {cluster_name, "reportService"} ]} ].
What version of Consul are you using? That'd tell me that the API is not compatible.
consul version 0.5.2
I think the problem is with the consul_service
value, can you try it with just rabbitmq
or remove it from the config. It will add the tag of with the value of cluster_name
for making sure it's not mixed with others
Hi gmr,
i just left this: [{autocluster, [ {consul_host, "localhost"}, {consul_port, 8500} ]} ].
and only the first one the arrive works ,all other nodes get the same error
=ERROR REPORT==== 9-Aug-2015::20:33:16 === autocluster: HTTP Response (500) CheckID does not have associated TTL
=ERROR REPORT==== 9-Aug-2015::20:33:16 === autocluster: Error updating Consul health check: "500"
Anything in the consul logs? I'm running this release in production and I do not have any such issues. With that as your only config, you do not need any, fwiw. It seems like the health check being submitted for the node is not correct. Only other thing I can think to check would be erlang version, maybe something wrong with assumptions about the httpc/inets library.
first of all i would just like to say thanks for the quick replies and also that i think that this plugin is awesome. i am sure that we will find th cause of the issue.
consul log: 2015/08/09 20:47:31 [ERR] http: Request /v1/agent/check/pass/service:rabbitmq, error: CheckID does not have associated TTL
the erlang version i am using is : 18.0
please let me know what version you are using
I'm using R18 as well. I'll add some debugging shortly after I do some yard work, but if you get the chance, can you set consul_service_ttl
to 60
or CONSUL_SERVICE_TTL
to 60
as an environment variable? It'd also be useful to see the response from curl http://localhost:8500//v1/health/service/rabbitmq
Hi gmr, i set consul_service_ttl to 60 and i did curl http://localhost:8500//v1/health/service/rabbitmq the result that i got was for the first rabbitmq that reached consul.
the issue is that if i create a cluster with 3 rabbit's only the first one to reach consul gets registered and the other 2 get the error:
=ERROR REPORT==== 9-Aug-2015::20:33:16 ===
autocluster: HTTP Response (500) CheckID does not have associated TTL
=ERROR REPORT==== 9-Aug-2015::20:33:16 ===
autocluster: Error updating Consul health check: "500
maybe i need to add some configuration to the rabbitmq??(i just install the rpm and add rabbitmq.config)
here is the response from the curl request:
[
{
"Node": {
"Node": "rabbitmqreportservice-172.31.17.237",
"Address": "172.31.17.237"
},
"Service": {
"ID": "rabbitmq",
"Service": "rabbitmq",
"Tags": null,
"Address": "",
"Port": 5672
},
"Checks": [
{
"Node": "rabbitmqreportservice-172.31.17.237",
"CheckID": "service:rabbitmq",
"Name": "Service 'rabbitmq' check",
"Status": "passing",
"Notes": "RabbitMQ Auto-Cluster Plugin TTL Check",
"Output": "",
"ServiceID": "rabbitmq",
"ServiceName": "rabbitmq"
},
{
"Node": "rabbitmqreportservice-172.31.17.237",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": ""
}
]
}
]```
enjoy the yard work :)
I've just released 0.4.1 that adds configurable logging. Add autocluster
to your rabbitmq.config
like so:
[{rabbit, [
{log_levels, [{autocluster, debug}, {connection, info}]}
]}].
And you should get debug logging information from the plugin about what it's submitting and what the replies are, that should get us cluster.
This is what I see when I talk to consul locally BTW:
curl -v "http://localhost:8500/v1/health/service/rabbitmq" | json_pp
* Connection #0 to host localhost left intact
[
{
"Checks" : [
{
"CheckID" : "service:rabbitmq",
"ServiceID" : "rabbitmq",
"ServiceName" : "rabbitmq",
"Output" : "",
"Status" : "passing",
"Notes" : "RabbitMQ Auto-Cluster Plugin TTL Check",
"Node" : "gmr-home.local",
"Name" : "Service 'rabbitmq' check"
},
{
"CheckID" : "serfHealth",
"Name" : "Serf Health Status",
"Output" : "Agent alive and reachable",
"ServiceName" : "",
"ServiceID" : "",
"Status" : "passing",
"Notes" : "",
"Node" : "gmr-home.local"
}
],
"Service" : {
"Tags" : null,
"Address" : "",
"ID" : "rabbitmq",
"Service" : "rabbitmq",
"Port" : 5672
},
"Node" : {
"Node" : "gmr-home.local",
"Address" : "192.168.2.2"
}
}
]
And here's my log output:
=INFO REPORT==== 9-Aug-2015::18:51:36 ===
autocluster: GET http://localhost:8500/v1/health/service/rabbitmq?passing
=INFO REPORT==== 9-Aug-2015::18:51:36 ===
autocluster: Response: [{ok,{{"HTTP/1.1",200,"OK"},
[{"date","Sun, 09 Aug 2015 22:51:36 GMT"},
{"content-length","2"},
{"content-type","application/json"},
{"x-consul-index","28"},
{"x-consul-knownleader","true"},
{"x-consul-lastcontact","0"}],
"[]"}}]
=INFO REPORT==== 9-Aug-2015::18:51:36 ===
autocluster: Registering node with consul
=INFO REPORT==== 9-Aug-2015::18:51:36 ===
autocluster: POST http://localhost:8500/v1/agent/service/register ["{\"ID\":\"rabbitmq\",\"Name\":\"rabbitmq\",\"Port\":5672,\"Check\":{\"Notes\":\"RabbitMQ Auto-Cluster Plugin TTL Check\",\"TTL\":\"30s\"}}"]
=INFO REPORT==== 9-Aug-2015::18:51:36 ===
autocluster: Response: [{ok,{{"HTTP/1.1",200,"OK"},
[{"date","Sun, 09 Aug 2015 22:51:36 GMT"},
{"content-length","0"},
{"content-type","text/plain; charset=utf-8"}],
[]}}]
=INFO REPORT==== 9-Aug-2015::18:51:36 ===
autocluster: Registered node
=INFO REPORT==== 9-Aug-2015::18:51:36 ===
autocluster: Node is only node in the cluster
=INFO REPORT==== 9-Aug-2015::18:51:36 ===
autocluster: Node appears to be the first in the cluster
...
=INFO REPORT==== 9-Aug-2015::18:51:36 ===
autocluster: Starting Consul Health Check TTL Timer
=INFO REPORT==== 9-Aug-2015::18:51:36 ===
Server startup complete; 1 plugins started.
* autocluster
=INFO REPORT==== 9-Aug-2015::18:51:51 ===
autocluster: GET http://localhost:8500/v1/agent/check/pass/service%3Arabbitmq
=INFO REPORT==== 9-Aug-2015::18:51:51 ===
autocluster: Response: [{ok,{{"HTTP/1.1",200,"OK"},
[{"date","Sun, 09 Aug 2015 22:51:51 GMT"},
{"content-length","0"},
{"content-type","text/plain; charset=utf-8"}],
[]}}]
=INFO REPORT==== 9-Aug-2015::18:52:06 ===
autocluster: GET http://localhost:8500/v1/agent/check/pass/service%3Arabbitmq
=INFO REPORT==== 9-Aug-2015::18:52:06 ===
autocluster: Response: [{ok,{{"HTTP/1.1",200,"OK"},
[{"date","Sun, 09 Aug 2015 22:52:06 GMT"},
{"content-length","0"},
{"content-type","text/plain; charset=utf-8"}],
[]}}]
...
Oh and it'd be more useful to see the output of that Consul request on the failing nodes (if that wasn't on one of the failing nodes).
Hi grm,
the output i showed you was from the failed node.
i attached 2 files :
just download the files from the links
thanks :)
So it looks like it's something to do with your node names. Both nodes are named:
rabbit@rabbitmqreportservice-172
with regard to how RabbitMQ/Erlang deals with node names.
There are also other nodes registered in Consul, rabbit@rabbitbd-2i0lixhie4s2seq
, which is returning health check info that is not related to autocluster that it is trying to cluster with.
I'd make sure that each node has a reasonable fqdn and it might be worth trying setting the RABBITMQ_USE_LONGNAME
environment variable to true
. Also, I'd remove all dead nodes from consul. if rabbit@rabbitbd-2i0lixhie4s2seq
is a valid node from another cluster, then I'd set the cluster name to something so it does not get returned in the results.
Hi gmr,
i did what you said and now they are showing in consul .
but i am not sure that they are actually clustered.
if i create a user or queue on one node it not created on the second node.
what should i do??
how can i verify that the rabbitmq is truly clustered?
If you're not installing the management UI plugin, you might want to activate that so you can access the web interface.
You can also use rabbitmqctl on one of the nodes to get the cluster status: rabbitmqctl cluster_status
i just checked it and this is the result Cluster status of node 'rabbitmqreportservice172-31-19-56@rabbitmqreportservice-172.31.19.56' ... [{nodes,[{disc,['rabbitmqreportservice172-31-19-56@rabbitmqreportservice-172.31.19.56']}]}, {running_nodes,['rabbitmqreportservice172-31-19-56@rabbitmqreportservice-172.31.19.56']}, {cluster_name,<"rabbitmqreportservice172-31-19-56@rabbitmqreportservice-172">}, {partitions,[]}]
as you can see , only one node is registered
Your node names are off for the plugin to work, you can't change the rabbit@
bit in the current version. What setting did you use to get it that way?
@gmr I'm pretty sure he changed the NODENAME
variable.
I'm having a similar issue, but I have done nothing to set hostnames when using the alpine-rabbitmq-autocluster Docker Image (ie they are auto-set to values like rabbit@e9bd0b21c5af
and rabbit@edae08d9e0bc
. When I first start a pair of such Containers, they each register as their own cluster. If i restart them, they log this error:
=INFO REPORT==== 15-Feb-2016::21:33:56 ===
autocluster: Registering node with consul
=ERROR REPORT==== 15-Feb-2016::21:34:01 ===
autocluster: Can not communicate with cluster nodes: [rabbit@node1]
This is odd since nothing is configured as rabbit@node1
. If there's a known solution to this kind of problem, here's a serverfault question about this.
EDIT The above was using a single Consul server on a separate machine from the two RMQ machines. I've tried this again using a Consul Container running on the same machines running each of the RMQ instances to act as a Consul Client. The RMQ instances will start and register with their co-hosted Consul Client. Both Consul Clients are connected to the same Consul Server. When starting one of the RMQ instances after enough time has elapsed for the 1st RMQ instance to fully register with Consul, we see this:
docker logs rmq2 | grep autoclusterautocluster: Registering node with consul
autocluster: Can not communicate with cluster nodes: [rabbit@192]
autocluster: Starting Consul Health Check TTL Timer
It looks like Consul is registering each RMQ instance using it's IP address for the hostname, and because there's a .
in it, it thinks it's an FQDN. If I set RABBITMQ_USE_LONGNAME to true
, RMQ fails to boot with this output.
+1, same issues as @hamx0r
Hi gmr.
I'm also having a similar issue where the nodes aren't clustering. My hostnames are FQDN and longname is true. I am on virtualbox and ports are open.
I am using the etcd backend though and not consul. Etcd is being populated fine.
Version: 0.4.1 Backend: etcd RabbitMQ 3.5.6 docker --version Docker version 1.10.0, build 590d5108 etcd --version etcd Version: 2.1.1 erlang.cookie: same Docker container: gavinmroy/alpine-rabbitmq-autocluster
etcdctl ls /rabbitmq/default /rabbitmq/default/dev1a.domain.net /rabbitmq/default/dev1b.domain.net
docker run --name rabbitmqcluster -d -h dev1b.domain.net -e RABBITMQ_USE_LONGNAME=true -e AUTOCLUSTER_TYPE=etcd -e ETCD_SCHEME=http -e ETCD_HOST=192.168.10.205 -e ETCD_PORT=2379 -e ETCD_PREFIX=rabbitmq -e ETCD_TTL=30 -p 4369:4369 -p 5672:5672 -p 15672:15672 -p 25672:25672 gavinmroy/alpine-rabbitmq-autocluster
rabbitmq.config [ {rabbit, [ {loopback_users, []}, {cluster_partition_handling, autoheal}, {delegate_count, 64}, {fhc_read_buffering, false}, {fhc_write_buffering, false}, {heartbeat, 60}, {queue_index_embed_msgs_below, 0}, {queue_index_max_journal_entries, 8192}, {log_levels, [ {autocluster, debug}, {connection, debug}, {channel, warning}, {federation, info}, {mirroring, info} ]}, {vm_memory_high_watermark, 0.8} ]}, {rabbitmq_management, [{rates_mode, basic}]}, {autocluster, [ {backend, "etcd"}, {etcd_host, "192.168.10.205"}, {etcd_port, 2379}, {etcd_scheme, "http"}, {etcd_prefix, "rabbitmq"}, {etcd_ttl, 30} ]} ].
Regards Deryk
Closing the loop here, changing nodename will not be supported until 0.5.0.
2016/11/29 04:24:16 Unexpected response code: 500 (CheckID does not have associated TTL)
Hi, I managed to install rabbitmq and add this plugin. When I run everything it is registered with consul but my rabbitmq is not clustered .
Example: If I add a user on one machine it is not added on other machines.