emqx / mongodb-erlang

MongoDB driver for Erlang
Apache License 2.0
9 stars 14 forks source link

fix: fix deadloop when connecting to single server with cluster config #25

Closed mononym closed 2 years ago

mononym commented 2 years ago

A deadloop is created when an invalid connection config, by specifying the wrong mongo_type and/or incorrect set names, is used to connect to a server/cluster. This happens because the mc_topology worker spawns the mc_server and mc_monitor processes which end up triggering a method on the mc_topology gen server which kills the mc_server process when there is a mismatch between the config and what is returned from the connected server/cluster.

However on receiving the 'EXIT' message from the monitored mc_server process, the mc_topology worker instantly restarts it and the whole thing starts all over again. This is a very tight loop and will end up causing severe problems if not detected right away and currently it's not simple to detect that a misconfiguration is the cause.

The only way to detect whether this is the cause would be to do the same check while the mc_topology worker is starting up which is what this code does. By doing this we can stop the connection call to mongo_api or mongoc from ever succeeding and return a meaningful error that can be used to indicate that the connection will never succeed as configured.

With this code change a bad configuration will return an error that looks something like this: {configured_mongo_type_mismatch,replicaSetNoPrimary,standalone}

Where the second value of the tuple is the type as defined in the connection configuration and the last value is what the Mongo server is configured as. A setName mismatch will return the following error: {configured_mongo_set_name_mismatch, ConfigurationSetName, ServerSetName}

The checks added in the mc_topology_logics module should match all of the checks made higher up in the file in the update_topology_state method that cause an exit.

zmstone commented 2 years ago

need an appup.src file.

mononym commented 2 years ago

The logging put into place now gives errors like this:

=ERROR REPORT==== 8-Mar-2022::14:11:19.259530 ===
Configured mongo client topology does not match actual mongo install topology. Configured: replicaSetNoPrimary; Actual: standalone

=ERROR REPORT==== 8-Mar-2022::14:11:19.331212 ===
Configured mongo client topology does not match actual mongo install topology. Configured: sharded; Actual: standalone
mononym commented 2 years ago

mc_topology Server now sends itself a message with a 1 second delay to init itself again, so if we do hit the deadloop because the server config changes in an incompatible way we will at least be caught in a slow loop rather than a tight one.