Open saleyn opened 3 years ago
I wonder if it is really necessary to use funs in the scenarios where you use send_and_run
? Maye it's possible to have just linear "happy path" and then signal errors with throw
? Or I'm missing smth?
It just becomes a bit difficult to reason about the order in which code lines are executed looking at the functions using send_and_run
The main issue is that when errors are thrown, the error log has a ton of details pertaining to the crashed gen_server, whereas in the majority of cases, especially when the server closes the socket after ELHO or similar scenarios, we don't really need to put the crash details in the log. Generally, I think the current implementation needs a major revision to clean up and simplify that logic, but this is merely a patch to make the existing code more usable.
@saleyn the thing is that throw
from gen_server
's callback is kind of equivalent to just returning from the gen_server:
so
handle_cast(..., State) ->
...
{noreply, NewState}.
is equivalent of
handle_cast(..., State) ->
...
throw({noreply, NewState}),
....
(same works for handle_call
/handle_info
and I think even init
).
Also, if you throw({stop, normal, State})
it would stop the process, but normal
exit reason (so, no error logs, no signal propagation).
When I implemented send/2
function making it throw {stop, {send_error, ...}, St}
I might actually just change it to throw {stop, normal, St}
, but decided to better let user-defined handle_error
decide (handle_error
callback could return {stop, normal, St}
instead of returning ok
or {stop, Error, St}
), but I made it more explicit (by generating error log) by-default.
User can define
-behaviour(gen_smtp_server_session).
handle_error(send_error, Err, St) ->
logger:debug("socket send error ~p", [Err]),
{stop, normal, St};
handle_error(_, _, St) ->
{ok, St}.
to suppress error_logger reports when gen_smtp_server_session:send/2
fails. It's probably should be better documented though...
If my memory serves me, a while back when I was implementing an SMTP server based on gen_smtp, the issue was that there were many places were gen_tcp/ssl's send/2 function would return an error, and this would result either in a failed match ok = gen_tcp:send(...)
or erlang:error/1
someplace, which obviously is not same thing as an erlang:throw/1
and would leak from the gen_server's exception handler, crashing the process.
I recall seeing this behavior on ELHO and QUIT messages, e.g. when the server was trying to send QUIT, by which time a client already closed the socket. Hence, I had to implement some more refined handing of those edge cases to eliminate useless log reports of process crashes.
@saleyn
a while back when I was implementing an SMTP server based on gen_smtp
depends on how long ago, because I made 2 PRs where error handling have been noticeably improved relatively recently: #211 adds handle_error
user-defined callback and #226 is exactly about handling send
errors; both PRs were merged in mid-2020
depends on how long ago, because I made 2 PRs where error handling have been noticeably improved relatively recently: #211 adds
handle_error
user-defined callback and #226 is exactly about handlingsend
errors; both PRs were merged in mid-2020
I believe it was more or less at the end of 2020 using the HEAD of master branch.
Oh, that's strange. Looking at the code, it seems the only place where we are not using gen_smtp_server_session:send/2
to send server's replies is init
callback (becaue return tuple format there is slightly different I think). And all the send/2
errors whould be possible to suppress by returning {stop, normal, State}
from handle_error
callback of gen_smtp_server_session
.
I would rally appreciate if you could provide some small example where we could reproduce it!
I've had this patched code running in production for the good part of this year, and haven't seen these issues. So it would be hard to reproduce without rolling back to the unpatched version, which is not something I could do any time soon. So you can table this PR for now, I guess. If I have more updates on this, I'll let you know.
@saleyn I also have gen_smtp master server running in production with the following customized handle_error/3
callback:
handle_error(Kind, [], State) when Kind == tcp_closed; Kind == ssl_closed ->
%% It's normal; no need to log/metric
{ok, State};
handle_error(out_of_order, <<"DATA">>, #state{orig_to = To, inbox = undefined} = State) when
is_binary(To)
->
%% RCPT TO and DATA are sent in pipeline, but rcpt failed due to recipient address not found
{ok, State};
handle_error(Kind, Details, #state{peer = Ip} = State) ->
tm_metrics:count_inc([smtp, error, total], 1, #{labels => [Kind]}),
?LOG_WARNING(
#{
tag => "Session error",
kind => Kind,
details => Details,
state => State
}
),
case Kind of
ssl_handshake_error ->
tm_smtp_tls_blacklist:report(Ip),
%% Force stop
{stop, normal, State};
ssl_error ->
tm_smtp_tls_blacklist:report(Ip),
% will stop anyway
{ok, State};
send_error when Details =:= closed ->
%% seems lots of servers close the connection prematurely (without sending QUIT)
{stop, normal, State};
_ ->
{ok, State}
end.
and I indeed suppress send_error
by returning normal
as stop reason. But I did it after I made some analysis of the traffic I receive; I'm not sure we should suppress them by-default, because it may make debugging more difficult. Maybe we should add more tips to the smtp_server_example
docstring suggesting which errors are a good candidate for being suppressed after initial analysis?
It certainly would be helpful to include this in the example server, and mention your comments from this thread there.
Add functionality to terminate gen_server gracefully in case of SSL handshake errors or premature closing of socket by an SMTP client.