apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.29k stars 1.04k forks source link

couchdb 2.0 cluster setup failing #733

Closed jvabob closed 7 years ago

jvabob commented 7 years ago

have 2 couchdb 2.0 systems setup couchdb-2-1 and couchdb-2-2

have firewall configured to allow access between the systems on 5984, 5986, 4369, and 9100-9200

I can telnet between the boxes to 5984, 5986, 4369

I can connect the 2 servers with erlang as described in the Clustering setup firewall guide and ping between them.

However, when I add couchdb-2-2 as a node on couchdb-2-1 I never see couchdb-2-1 as a node on couchdb-2-2.

I have set the name in vm.args like this

-name couchdb@couchdb-2-2.qa.aws.net

sys.config looks like this

[
    {lager, [
        {error_logger_hwm, 1000},
        {error_logger_redirect, true},
        {handlers, [
            {lager_console_backend, [debug, {
                lager_default_formatter,
                [
                    date, " ", time,
                    " [", severity, "] ",
                    node, " ", pid, " ",
                    message,
                    "\n"
                ]
            }]}
        ]},
        {inet_dist_listen_min, 9100},
        {inet_dist_listen_max, 9200}
    ]}
].

I am seeing these entries repeating on couchdb-2-1 in couchdb.stderr

[notice] 2017-08-01T14:16:42.335487Z couchdb@couchdb-2-1.qa.aws.net <0.389.0> -------- chttpd_auth_cache changes listener died {nocatch,{error,timeout}} at fabric_view_changes:send_changes/6(line:192) <= fabric_view_changes:keep_sending_changes/8(line:82) <= fabric_view_changes:go/5(line:43)
[error] 2017-08-01T14:16:42.335400Z couchdb@couchdb-2-1.qa.aws.net emulator -------- Error in process <0.28182.0> on node 'couchdb@couchdb-2-1.qa.aws.net' with exit value:
{{nocatch,{error,timeout}},[{fabric_view_changes,send_changes,6,[{file,"src/fabric_view_changes.erl"},{line,192}]},{fabric_view_changes,keep_sending_changes,8,[{file,"src/fabric_view_changes.erl"},{line,82}]},{fabric_view_changes,go,5,[{file,"src/fabric_view_changes.erl"},{line,43}]}]}

[notice] 2017-08-01T14:16:47.336458Z couchdb@couchdb-2-1.qa.aws.net <0.29324.0> -------- Failed to ensure auth ddoc _users/_design/_auth exists for reason: read_failure

seems related but I don't know where to go from here.

wohali commented 7 years ago

Can you confirm that DNS works correctly on both nodes? For instance, on couchdb-2-1 are you able to ping couchdb-2-2.qa.aws.net and vice-versa?

Can you provide the specific commands you are using to connect the nodes? Are you using the /_cluster_setup endpoint with curl?

jvabob commented 7 years ago

yes dns works fine and I can telnet from one system to the other on the ports 5984, 5986, 4369 using the dns names

jvabob commented 7 years ago

I used the cluster setup wizard on couchdb-2-1 to add couchdb-2-2 as a node

jvabob commented 7 years ago

I used the fully qualified domain name when I added the node and in the -name attribute in the vm.args file

wohali commented 7 years ago

Please share your full etc/ directory contents from both boxes (with any [admins] sections removed!) for us to look at this further.

janl commented 7 years ago

please reopen with requested info