Closed chrzaszcz closed 1 month ago
elasticsearch_and_cassandra_26 / elasticsearch_and_cassandra_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 437 / Failed: 0 / User-skipped: 43 / Auto-skipped: 0
small_tests_25 / small_tests / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root / small
small_tests_26 / small_tests / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root / small
small_tests_26_arm64 / small_tests / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root / small
ldap_mnesia_25 / ldap_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 2284 / Failed: 0 / User-skipped: 895 / Auto-skipped: 0
dynamic_domains_mysql_redis_26 / mysql_redis / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 4506 / Failed: 0 / User-skipped: 144 / Auto-skipped: 0
ldap_mnesia_26 / ldap_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 2284 / Failed: 0 / User-skipped: 895 / Auto-skipped: 0
dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 4539 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0
internal_mnesia_26 / internal_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 2424 / Failed: 0 / User-skipped: 755 / Auto-skipped: 0
dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 4535 / Failed: 1 / User-skipped: 114 / Auto-skipped: 0
pgsql_mnesia_25 / pgsql_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 4928 / Failed: 0 / User-skipped: 118 / Auto-skipped: 0
dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 4539 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0
pgsql_mnesia_26 / pgsql_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 4928 / Failed: 0 / User-skipped: 118 / Auto-skipped: 0
mysql_redis_26 / mysql_redis / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 4907 / Failed: 0 / User-skipped: 139 / Auto-skipped: 0
mssql_mnesia_26 / odbc_mssql_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 4925 / Failed: 0 / User-skipped: 121 / Auto-skipped: 0
pgsql_cets_26 / pgsql_cets / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 4454 / Failed: 0 / User-skipped: 178 / Auto-skipped: 0
dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / 509759e917f4c152f12e50a6b42488c7a51bf0fc Reports root/ big OK: 4536 / Failed: 0 / User-skipped: 114 / Auto-skipped: 0
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 84.43%. Comparing base (
e284af4
) to head (509759e
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
The
dist_blocker
feature was introduced to protect a disconnected node from reconnecting before other nodes finished the cleanup to avoid issues with inconsistent CETS tables. The solution was to change the cookie value onnodeup
(sic!) for the particular node until the cleanup for that node is finished.There are following issues with that solution:
nodedown
is received, and cleanup is started. However, when the node comes back online,nodedown
would be received, followed bynodeup
. The problem is thatdist_blocker
is preventing the node from connecting, which might delay reconnection. In MongooseHelm tests this was causing a 15-second delay before reconnection, but it could be more in some configurations. There is even potential for a deadlock, but I couldn't trigger it in tests.dist_blocker
kicks in, multiple error messages would appear on several nodes, informing about the blocked connections. These unwanted and surprising logs could confuse the user.Because of these concerns,
dist_blocker
is disabled in this PR. It is enough to disable it in MIM, because CETS does not enable it by default.The MongooseHelm tests have shown that after https://github.com/esl/MongooseIM/pull/4250 it is difficult to reproduce the issue - it occurred only once in about 50 tests, which try to quickly restart MIM to trigger the issue. Keep in mind, that with
dist_blocker
it has also failed at least twice on CI.We can prevent the issues in different ways, and we can do so in separate PR's.