eredis_client:maybe_reconnect logs are too verbose

ghost commented 2 years ago

Mainly visual/convenience issue: if the eredis_client processes are calling maybe_reconnect repeatedly, it creates a lot of "ERROR REPORT" messages in the erlang logs. By default each client process tries to connect once every 100ms, and if eredis_cluster is used as the supervisor tree for the clients then there are 10 clients per redis server.

A common trivial scenario is when a redis server is intentionally removed (scaling down cluster). In this case reconnections will fail with connection reset, while the original connection will have been severed with etimedout or connection reset (in the eredis_cluster usage scenario, eventually the slot maps are refreshed and the clients towards the deleted server are terminated). It works fine, but a lot of spam is generated in the logs.

I wonder if it could be improved; some ideas to consider:

only log the report & reason on first reconnect attempt (ie. the original reason for the reconnect, not the reason from the last attempt each time)?
if the repeated (ie. not initial) reconnect attempts are logged, maybe the level there should be debug or info, not error
only log the report & reason if the Reason is different than last time (common case is where the same reason is repeated many times, like when the redis server leaves and is no longer listening)

ghost commented 2 years ago

Small addendum: it would also be cool if the error logged more info about the process:

    error_logger:error_msg("eredis: Re-establishing connection to ~p:~p due to ~p",
                           [Host, Port, Reason]),

Here, once again considering the cluster scenario, a lot of client processes will "spam" this at once, so it cannot be made out which is which. I guess the self() pid could be added to the message to make that clear.

zuiderkwast commented 2 years ago

Good idea. Let's include this in 1.5.2, which I guess we'll need to release soon due to #58.

ghost commented 2 years ago

Up to you, for now it seems the 1.5.2 is not super urgent, as we will be able to internally cherry pick 794816baeb131ebf4b02cf0992fa35e976794e4f considering its scope

zuiderkwast commented 2 years ago

Sure you may cherry-pick, but eredis is widely used by many Erlang applications around the world. It's not polite to introduce a bug like that and not release a patch version IMO.

ghost commented 2 years ago

Maybe it's a better idea to only include the bugfix in the 1.5.2 version, then release 1.5.3 or 1.6.0 with logging improvements. As we see there's some risk with every refactor to cause breakage.

zuiderkwast commented 2 years ago

When we do this, let's replace error_logger with logger. It was added in OTP 21, so all our supported versions have it.

ghost commented 2 years ago

As discussed, preferable solution would be the ability to provide a logger module's name (atom) and for eredis to use that. Our application has its own logger module that mostly replicates the standard logger's "API" in terms of available function calls.

An additional improvement would be making the logs less spammy in general somehow (not sure what's the best approach there, but once every 100ms is a lot with 10 connections per redis master), though our logger does have some spam protection where it will not print the same log too many times in succession.

Nordix / eredis

eredis_client:maybe_reconnect logs are too verbose #61