Closed rrag closed 2 years ago
Interesting. One thing you can try is to skip setting NODENAME altogether and instead just set the name in ERL_FLAGS:
-e ERL_FLAGS="-setcookie brumbrum -name couchdb" \
This will cause the Erlang VM to try to determine the FQDN of the container when it starts up and use that for the nodename. If it can't determine the FQDN the VM should crash on startup. This is how we start CouchDB when it's installed in Kubernetes using the couchdb-helm chart, and it definitely does use FQDNs correctly there with this same container image.
Yes it does crash when I use just -name couchdb
$ docker run \
--rm \
--name prodb_couch \
--dns=10.139.200.202 \
--mount type=bind,source=${HOME}/prodb/data/couchdb,target=/opt/couchdb/data \
--mount type=bind,source=${HOME}/prodb/config/couchdb/local.d,target=/opt/couchdb/etc/local.d \
-e ERL_FLAGS="-setcookie brumbrum -name couchdb" \
-p 5984:5984 \
-p 4369:4369 \
-p 9100:9100 \
apache/couchdb:3.2.1
2022-01-13 16:21:37 Can't set long node name!
Please check your configuration
To debug this I also did the following Ran the erlang docker container on the 3 nodes
docker run -it \
--dns=10.139.200.202 \
-p 4369:4369 \
-p 9100:9100 \
--rm erlang /bin/sh
Then on node 1
erl -name bus@api01.prod.blr1.gc -setcookie 'brumbrum' -kernel inet_dist_listen_min 9100 -kernel inet_dist_listen_max 9100
on node 2
erl -name car@api02.prod.blr1.gc -setcookie 'brumbrum' -kernel inet_dist_listen_min 9100 -kernel inet_dist_listen_max 9100
on node 3
erl -name van@api03.prod.blr1.gc -setcookie 'brumbrum' -kernel inet_dist_listen_min 9100 -kernel inet_dist_listen_max 9100
and from node 1
# net_kernel:connect_node('car@api02.prod.blr1.gc').
true
so connectivity using erl works
Would it be helpful if I record a video of these steps?
when I tried to start the docker with
docker run -it \
--rm \
--name prodb_couch \
--mount type=bind,source=${HOME}/prodb/data/couchdb,target=/opt/couchdb/data \
--mount type=bind,source=${HOME}/prodb/config/couchdb/local.d,target=/opt/couchdb/etc/local.d \
-e ERL_FLAGS="-setcookie brumbrum -name ts@10.139.200.203" \
-p 5984:5984 \
-p 4369:4369 \
-p 9100:9100 \
apache/couchdb:3.2.1
notice -name ts@ipaddress
instead of -name couchdb@ipaddress
even then the cluster creation fails with
{"error":"setup_error","reason":"Cluster setup timed out waiting for nodes to connect"}
but when I do -name couchdb@ipaddress
cluster is created successfully.
so here is what actually works
couchdb@ipaddress
these below names lead to Cluster setup timed out waiting for nodes to connect
notcouchdb@ipaddress
couchdb@f.q.d.n
notcouchdb@f.q.d.n
Here are the contents of
local.d
user.ini
-------
[admins]
admin = -pbkdf2-REDACTED
couch_user = -pbkdf2-REDACTED
[couchdb]
uuid = e697b5ff329cea4b410b4ee62980fc6d
[chttpd_auth]
secret = supersecret
ok I have made some more progress and found a mistake in my steps. I am able to do cluster with FQDN, but the name always has to be couchdb@...
anything other than that cluster will not finish
here are the steps for anyone else facing this problem
I have 3 droplets in digitalocean
here are their ip addresses and the FQDN using a private dns server I have configured
COUCH_NODE1=10.139.200.203 api01.prodb.blr1.gc
COUCH_NODE2=10.139.200.208 api02.prodb.blr1.gc
COUCH_NODE2=10.139.200.209 api03.prodb.blr1.gc
Now I run this docker command
# on node 1
docker run \
--rm \
--dns=10.139.200.202 \
--name prodb_couch \
--mount type=bind,source=${HOME}/prodb/data/couchdb,target=/opt/couchdb/data \
--mount type=bind,source=${HOME}/prodb/config/couchdb/local.d,target=/opt/couchdb/etc/local.d \
-e ERL_FLAGS="-setcookie brumbrum -name couchdb@api01.prod.blr1.gc" \
-p 5984:5984 \
-p 4369:4369 \
-p 9100:9100 \
apache/couchdb:3.2.1
# on node 2
docker run \
--rm \
--dns=10.139.200.202 \
--name prodb_couch \
--mount type=bind,source=${HOME}/prodb/data/couchdb,target=/opt/couchdb/data \
--mount type=bind,source=${HOME}/prodb/config/couchdb/local.d,target=/opt/couchdb/etc/local.d \
-e ERL_FLAGS="-setcookie brumbrum -name couchdb@api02.prod.blr1.gc" \
-p 5984:5984 \
-p 4369:4369 \
-p 9100:9100 \
apache/couchdb:3.2.1
# on node 3
docker run \
--rm \
--dns=10.139.200.202 \
--name prodb_couch \
--mount type=bind,source=${HOME}/prodb/data/couchdb,target=/opt/couchdb/data \
--mount type=bind,source=${HOME}/prodb/config/couchdb/local.d,target=/opt/couchdb/etc/local.d \
-e ERL_FLAGS="-setcookie brumbrum -name couchdb@api03.prod.blr1.gc" \
-p 5984:5984 \
-p 4369:4369 \
-p 9100:9100 \
apache/couchdb:3.2.1
the --dns=10.139.200.202 \
is important FQDN with private DNS is not working without that
my local.d
folder has only one file
user.ini
-------
[admins]
admin = -pbkdf2-REDACTED
couch_user = -pbkdf2-REDACTED
[couchdb]
uuid = e697b5ff329cea4b410b4ee62980fc6d
[chttpd_auth]
secret = supersecret
Now once the 3 nodes have started I run these commands
curl -X POST -H "Content-Type: application/json" \
http://admin:admin_password@api01.prodb.blr1.gc:5984/_cluster_setup \
-d '{"action": "enable_cluster", "bind_address":"0.0.0.0", "username": "admin", "password":"admin_password", "node_count":"3"}'
{"error":"bad_request","reason":"Cluster is already enabled"}
^ always returns an error that cluster is already enabled. for the longest time I was not sure what to do. then I just ignored this step and proceeded to the next
curl -X POST -H "Content-Type: application/json" \
http://admin:admin_password@api01.prodb.blr1.gc:5984/_cluster_setup \
-d '{"action": "enable_cluster", "bind_address":"0.0.0.0", "username": "admin", "password":"admin_password", "port": 5984, "node_count": "3", "remote_node": "api02.prod.blr1.gc", "remote_current_user": "admin", "remote_current_password": "admin_password" }'
{"ok":true}
curl -X POST -H "Content-Type: application/json" \
http://admin:admin_password@api01.prodb.blr1.gc:5984/_cluster_setup \
-d '{"action": "add_node", "host":"api02.prod.blr1.gc", "port": 5984, "username": "admin", "password":"admin_password"}'
{"ok":true}
curl -X POST -H "Content-Type: application/json" \
http://admin:admin_password@api01.prodb.blr1.gc:5984/_cluster_setup \
-d '{"action": "enable_cluster", "bind_address":"0.0.0.0", "username": "admin", "password":"admin_password", "port": 5984, "node_count": "3", "remote_node": "api03.prod.blr1.gc", "remote_current_user": "admin", "remote_current_password": "admin_password" }'
{"ok":true}
curl -X POST -H "Content-Type: application/json" \
http://admin:admin_password@api01.prodb.blr1.gc:5984/_cluster_setup \
-d '{"action": "add_node", "host":"api03.prod.blr1.gc", "port": 5984, "username": "admin", "password":"admin_password"}'
{"ok":true}
sleep 4
curl -X POST -H "Content-Type: application/json" \
http://admin:admin_password@api01.prodb.blr1.gc:5984/_cluster_setup \
-d '{"action": "finish_cluster"}'
{"ok":true}
The problem I had was the in the /_cluster_setup
path I was always passing the remote ip in remote_node
and host
now I changed that to pass the FQDN in these remote_node
and host
now finally
curl http://admin:startup2017@api02.prodb.blr1.gc:5984/_membership | jq
{
"all_nodes": [
"couchdb@api01.prod.blr1.gc",
"couchdb@api02.prod.blr1.gc",
"couchdb@api03.prod.blr1.gc"
],
"cluster_nodes": [
"couchdb@api01.prod.blr1.gc",
"couchdb@api02.prod.blr1.gc",
"couchdb@api03.prod.blr1.gc"
]
}
I still do not know how to not have couchdb@
because if I change to anything else I get the error during last step of the _cluster_setup
{"error":"setup_error","reason":"Cluster setup timed out waiting for nodes to connect"}
Could anyone suggest what I am missing for that
WIth help from the nice folks in the couchdb slack channel I got this resolved. the solution is to add a "name": "notcouchdb"
to the "add_node"
e.g.
curl -X POST -H "Content-Type: application/json" \
http://admin:admin_password@api01.prodb.blr1.gc:5984/_cluster_setup \
-d '{"action": "add_node", "host":"api02.prod.blr1.gc", "name", "couch01", "port": 5984, "username": "admin", "password":"admin_password"}'
{"ok":true}
curl -X POST -H "Content-Type: application/json" \
http://admin:admin_password@api01.prodb.blr1.gc:5984/_cluster_setup \
-d '{"action": "add_node", "host":"api03.prod.blr1.gc", "name", "couch01", "port": 5984, "username": "admin", "password":"admin_password"}'
{"ok":true}
Once this additional "name"
is added the cluster gets created properly
Thanks for following up and posting the final resolution!
Expected Behavior
When the DNS is properly configured on the docker using
--dns=x.x.x.x
the cluster set up should be success similar to using IP addressEven when running docket with
--dns
attribute theNODENAME
does not work when a FQDN is present.Current Behavior
Using IP address in the
NODENAME
leads to sucessful cluster setup, but when using FQDN cluster set up times outPossible Solution
I have
ssh
ed into the docker container and I am able tonslookup
the DNS.in the documentation it is mentioned
I have a properly configured
bind
and there are no tricks there.Steps to Reproduce (for bugs)
When using this the cluster setup is complete
But when using this unable to complete the cluster set up and I get the error
Context
I have a private dns server set up and all my instances are configured to use this as the nameserver. It works well with all apps and even within the docker container I can resolve these names correctly
Your Environment