gen-smtp / gen_smtp

The extensible Erlang SMTP client and server library.
Other
684 stars 265 forks source link

`QUIT` command crashes server #239

Open stephanos opened 3 years ago

stephanos commented 3 years ago

gen_smtp is working great for us! One thing pops up occasionally, though. For some reason, the QUIT command crashes the server. Our set up has postfix relay emails to SMTP, so, of course, it could come from there, too? I just wanted to bring this up and see if this rings any bells.

[info] unhandled command: Q

[info] unhandled command: UIT

[info] Terminating Session: {{:badmatch, {:error, :einval}}, [{:gen_smtp_server_session, :setopts, 2, [file: 'gen_smtp/src/gen_smtp_server_session.erl', line: 993]}, {:gen_smtp_server_session, :handle_info, 2, [file: 'gen_smtp/src/gen_smtp_server_session.erl', line: 283]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 637]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 711]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}

[error] GenServer #PID<0.21474.5> terminating
** (MatchError) no match of right hand side value: {:error, :einval}
    (gen_smtp 1.0.1) gen_smtp/src/gen_smtp_server_session.erl:993: :gen_smtp_server_session.setopts/2
    (gen_smtp 1.0.1) gen_smtp/src/gen_smtp_server_session.erl:283: :gen_smtp_server_session.handle_info/2
    (stdlib 3.12.1) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib 3.12.1) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib 3.12.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:tcp, #Port<0.453127>, "UIT\r\n"}

(note that the first 3 logs are from our app)

seriyps commented 3 years ago

Well, seems it's not 100% related to the broken "quit" command, but rather to the fact that we used to not account for the possibility that socket will be suddenly closed. It was partially solved in #211 and especially by https://github.com/gen-smtp/gen_smtp/pull/226/files#r515038937, but, at least the last one is not in hex.pm yet

seriyps commented 3 years ago

But why QUIT is split is interesting. I have a theory... We normally use {packet, line} option so each SMTP command is delivered separately, but when we are reading the body of DATA (so, the email payload) we temporary switch the socket to {packet, raw} https://github.com/gen-smtp/gen_smtp/blob/54be999c18c24137584166d8ef7a7fc578fafba4/src/gen_smtp_server_session.erl#L269

and parse the packets manually untill we get \r\n.\r\n. And if the packet from gen_tcp arrived as \r\n,\r\nQ then we would send this Q back to ourself https://github.com/gen-smtp/gen_smtp/blob/54be999c18c24137584166d8ef7a7fc578fafba4/src/gen_smtp_server_session.erl#L235

which is a quite dirty hack, since we don't have internal accumulator and handle each chunk of data as a complete message.

I have an old stale branch where I tried to address this issue, but never finished it https://github.com/gen-smtp/gen_smtp/commit/789a0719f7d9a9a6ed798dcd80f56d6e5edde6a8 because I got some suspicious results from my benchmarks (decreased performance)

seriyps commented 3 years ago

Realized I'm not actually solving this problem in my branch:

https://github.com/gen-smtp/gen_smtp/commit/789a0719f7d9a9a6ed798dcd80f56d6e5edde6a8#diff-29fce4d0fc07289a91141e488d851aa04fee754665ecde349c0d9c0da839d9b8R254-R255

but at least the problem is acknowleged there

stephanos commented 3 years ago

I'm not well-versed in reading Erlang, unfortunately. But that makes sense to me. Right now I'm seeing this occur about once per 10K emails.

Could you help me understand the impact this has right now? Since the session is crashing, I'm assuming gen_smtp will simply start a new one? If that's the case, I could live with that glitch.

seriyps commented 3 years ago

Since it crashes on QUIT and when your peer server have already closed the connection, it should be safe, unless you have some processes linked to SMTP server session.

gen_smtp always starts a new process for each session (connection), so, this process would die with normal reason anyway. With this bug it dies a couple milliseconds earlier and spams error logs

stephanos commented 3 years ago

thanks for confirming!

I'll leave it up to you to decide whether you want to leave this issue open or not :)