comtihon / mongodb-erlang

MongoDB driver for Erlang
Apache License 2.0
341 stars 267 forks source link

Issue connecting to ReplicaSet #226

Closed ivan-perl closed 3 years ago

ivan-perl commented 4 years ago

Hi,

I'm trying to connect to a ReplicaSet but getting different errors and after all connection failing.

Case number 1: Connecting to the ReplicaSet in K8 cluster. Code to connect:

 {ok, Connection} = 
                    mongoc:connect({ rs, <<"rs0">>, ["server1:27017", "server2:27017", "server3:27017"] },
                                                [],
                                                [{database, <<"dbName">>},
                                                 {login, <<"userName">>},
                                                 {password, <<"password">>}]).

As a result I see exceptions in the logs pointing to some bad_match error in code:

** Reason for termination ==
** {{badmatch,[]},
    [{mc_topology,parse_ismaster,4,
                  [{file,"/usr/src/app/server_core/_build/default/lib/mongodb/src/mongoc/mc_topology.erl"},
                   {line,204}]},
     {mc_topology,handle_cast,2,
                  [{file,"/usr/src/app/server_core/_build/default/lib/mongodb/src/mongoc/mc_topology.erl"},
                   {line,145}]},
     {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,637}]},
     {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,711}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}

There is a huge error message, so I can provide other parts if needed.

Case number 2: Connecting MongoDB service from outside K8 cluster.

In this case I'm doing a port forwarding to local host and trying to connect to forwarded port like this:

{ok, Connection} = mongoc:connect({ unknown, [ "localhost:27018" ] },
                                        Conf,
                                        [{database, MongoDBName},
                                        {login, MongoDBUserName},
                                        {password, MongoDBPassword}]).

I also tried

{ok, Connection} = mongoc:connect({ single, "localhost:27018" }

Result is following:

=ERROR REPORT==== 22-May-2020::16:37:22 ===
** Generic server <0.112.0> terminating
** Last message in was {'EXIT',<0.111.0>,killed}
** When Server state == {state,rsPrimary,"localhost",27018,<0.110.0>,
                               <0.111.0>,
                               [{name,m_pool_name},
                                {register,storage_mongodb_connection_pool},
                                {pool_size,1},
                                {max_overflow,1},
                                {connectTimeoutMS,10000},
                                {socketTimeoutMS,10000},
                                {serverSelectionTimeoutMS,30000},
                                {waitQueueTimeoutMS,10000},
                                {heartbeatFrequencyMS,10000},
                                {minHeartbeatFrequencyMS,1000},
                                {rp_mode,primary}],
                               [{database,<<"DBName">>},
                                {login,<<"DBLogin">>},
                                {password,<<"DBPassword">>}],
                               undefined,10000,10000,1000,<0.139.0>,1}
** Reason for termination ==
** killed

After first error like shown above there is an endless flow of alike messages with only one difference in first line:

** When Server state == {state,undefined,"localhost",27018,<0.110.0>,

In both cases I can connect to the Mongo using CLI client and Compass clients and they are working fine. That is why login/password/dbName/hosts/ports are correct.

MongoDB Server version is 4.0.15

I tried with locally installed single-node 4.0.18 Server and everything worked well. Issues with cluster deployment only.

Please, advice.

Thank you in advance,

Regards, Ivan

ivan-perl commented 4 years ago

Update: After browsing other issues I faced this one: https://github.com/comtihon/mongodb-erlang/issues/160

In our case I also passed host names to the connection settings. After replacing hosts names with ip addresses application stopped crashing at start (when connect is actually called.) Here is how my hosts pointer looks like now:

{ok, Connection} = mongoc:connect({ rs, <<"rs0">>, ["1.1.1.1:27017", "2.2.2.2:27017", "3.3.3.3:27017"] },

And everything looks well until I'm trying to write anything to Mongo, after first attempt I'm flooded with errors like this:

=SUPERVISOR REPORT==== 23-May-2020::01:24:31.782669 ===
    supervisor: {<0.503.0>,poolboy_sup}
    errorContext: child_terminated
    reason: {<<"Can't pass authentification">>,
             [{mc_auth_logic,scram_sha_1_auth,5,
                             [{file,"/usr/src/app/server_core/_build/default/lib/mongodb/src/connection/mc_auth_logic.erl"},
                              {line,55}]},
              {mc_worker,auth_if_credentials,5,
                         [{file,"/usr/src/app/server_core/_build/default/lib/mongodb/src/connection/mc_worker.erl"},
                          {line,219}]},
              {mc_worker,init,1,
                         [{file,"/usr/src/app/server_core/_build/default/lib/mongodb/src/connection/mc_worker.erl"},
                          {line,54}]},
              {proc_lib,init_p_do_apply,3,
                        [{file,"proc_lib.erl"},{line,249}]}]}
    offender: [{pid,<0.1105.0>},
               {id,mc_worker},
               {mfargs,{mc_worker,start_link,undefined}},
               {restart_type,temporary},
               {shutdown,5000},
               {child_type,worker}]
=CRASH REPORT==== 23-May-2020::01:24:31.785930 ===
  crasher:
    initial call: mc_worker:init/1
    pid: <0.1106.0>
    registered_name: []
    exception error: <<"Can't pass authentification">>
      in function  mc_auth_logic:scram_sha_1_auth/5 (/usr/src/app/server_core/_build/default/lib/mongodb/src/connection/mc_auth_logic.erl, line 55)
      in call from mc_worker:auth_if_credentials/5 (/usr/src/app/server_core/_build/default/lib/mongodb/src/connection/mc_worker.erl, line 219)
      in call from mc_worker:init/1 (/usr/src/app/server_core/_build/default/lib/mongodb/src/connection/mc_worker.erl, line 54)
    ancestors: [<0.503.0>,<0.502.0>,mc_pool_sup,<0.128.0>]
    message_queue_len: 0
    messages: []
    links: [<0.503.0>,#Port<0.612>,<0.502.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 9711
  neighbours:

In the list of applications I'm starting before MongoDB and my own are ssl and pbkdf2, complete list is like this:

-define(APPS, [asn1, crypto, public_key, ssl, ranch, cowlib, cowboy, bson, poolboy, inets, mongodb, webserver]).
ivan-perl commented 4 years ago

Looks like currently this MongoDB client supports only SCRAM-SHA-1 and server I'm running against wants SCRAM-SHA-256. Is there any plan to support this authentication mechanism?

comtihon commented 4 years ago

Sorry, currently I am working on another opensource project. For this one I only review & merge PRs.

henryj commented 4 years ago

got the same replica set connection issue today. it seems that we need to use the same name from rs.status() to set the connection here, no matter it's IP or hostnames. as long as they are the same, it works.

dmsnell commented 3 years ago

it seems that we need to use the same name from rs.status() to set the connection here, no matter it's IP or hostnames. as long as they are the same, it works.

I encountered this as well and when I made sure to match the names from rs.status() it didn't so much fix the problem as shift the errors. the mongo pool crashes every few seconds with the parsing error and then restarts. functionally I don't think it affects the code relying on the library but it sure spews out a lot of garbage crash reports and probably leads to unnecessary DB reconnects.

dmsnell commented 3 years ago

I'm seeing a couple issues and having trouble fully understanding what's going on.

  1. I'm connecting with a hostname that gets changed when pinging the server in mc_monitor. The call to mc_worker_api:command(Connection, {isMaster, 1}) returns the name in rs.status() which differs from what I used to connect. I did not remember the name changing and think it might have been reassigned somehow. I'm running this inside a docker-compose environment and originally Mongo was setup with the container_name as the name of the replicaSet member. Now it's set to the Docker container id.
  2. The crash happens when attempting to get the server information out of the Topology's ets table. I believe it crashes because by the time parse_ismaster is called that ets table has disappeared with the parent process/topology which already died.

I'm very confused on this. Trying to trace what leads up to the failure. @henryj have you had any further insight into the issue?

dmsnell commented 3 years ago

After more investigation I find that if the replica-set name is a string instead of a binary then it crashes. Also if the hosts and ports are in any way different than what is reported by rs.status() then it will crash (as @henryj mentioned).

The mechanism for the crash is the missing entry in the ETS table. I think this happens because it looks for the binary version of the hostname and port when it was originally stored as text but I haven't been able to see this in practice. Once the match fails [Saved] = ets:select(…) the process crashes all the way up to the pool and things reconnect.

What I think might be happening is that mc_monitor:check/2 connects successfully to the server as configured but then gets a response back with the different version of the host/port string which puts the IsMaster argument (the response from Mongo) out of sync with the mc_topology state. This happens without external trigger and that's why I think it's hard for me to find. The next time it tries to check the connection the state and ets tables are different and things crash.

For resolution maybe there's some means to force everything to be a binary. Beyond that there's probably a need to reference the servers in the ets tables by something other than their connect string. If we successfully connect to host:port then we should be able to continue using that even if it self-identifies differently.