KumoCorp / kumomta

The first Open-Source high-performance MTA developed from the ground-up for high-volume email sending environments.
https://kumomta.com
Apache License 2.0
217 stars 25 forks source link

Improve response message when a connection fails to respond within a timeout. #196

Open tommairs opened 2 months ago

tommairs commented 2 months ago

While recently testing a deployment, I noticed that connections to a port that seemed open, but never returned an EHLO response, also seemed to not record any error message. In this case, we were trying to use STARTTLS on port 465, which was using SMTPS and KumoMTA did not get the response it was expecting so it behaved like a tarpit. The timeout seemed to expire, but a descriptive error message was not logged anywhere I was able to find.

Ideally, a connection error should be logged showing that the remote server did not respond to the EHLO.

I will continue to investigate.

wez commented 2 months ago

When connecting to an SMTPS port the remote host will not volunteer any data; it is waiting on the client to initiate the TLS handshake. This is at-odds with SMTP in which the client connects and waits for the remote host to send the banner. So a mis-configured client will essentially "deadlock" its conversation when talking to an SMTPS host until either party times out.

In kumomta, we will wait for the connect_timeout to expire before attempting to connect to the next host in the connection plan. Only once it has been exhausted (or we successfully connected somewhere) will the set of connection failures be logged.

We could consider adding a new connection failure log event for this case, but it will result in increased log volume and IO pressure.

The outbound tracing stuff will also help to diagnose this situation without impacting logging.

wez commented 1 month ago

In main, connect_timeout has now been split into connect_timeout (for the raw connection establishment) and banner_timeout (for the reading of the initial 220).

While it doesn't directly address the introspective side of this issue, it does allow eg: setting connect_timeout to something fairly short, while keeping the banner timeout are a more RFC-appropriate value.

This won't help with trying to send to an SMTPS destination due to misconfiguration.

However, also in main, is kcli trace-smtp-client which can provide insight into the connection attempts being made for a given client session.