aio-libs / aiosmtpd

A reimplementation of the Python stdlib smtpd.py based on asyncio.
https://aiosmtpd.aio-libs.org
Apache License 2.0
326 stars 96 forks source link

Entire python process stalls when receiving multiple large inbound emails #293

Open davidmcnabnz opened 3 years ago

davidmcnabnz commented 3 years ago

I'd rate this as a critical bug, unless a workaround has already been found (didn't see anything in my reading of the docs),

My asyncio process instantiates a Controller, then starts it. In the separate thread started by the Controller object, aiosmtpd opens an inbound port and listens for sessions. So far so good, all pretty normal.

But at times when multiple external clients are simultaneously injecting large-ish (>5MB) messages, the entire python process -- all event loops, all threads -- grinds to a complete halt.

My tests have been to get an external machine to simultaneously inject 10 messages, of 20MB each, into my SMTPD port. I provide my Controller object with a null stub handle_DATA() hook, which simply returns "250 Accepted".

Every time I run this test, the entire process freezes for between 10.5 and 18 seconds, until the injections are complete.

I'm guessing this is a GIL issue. My "main" thread and asyncio event loop freezes, along with everything running within that process.

I'm considering a radical option of spawning a separate aiosmtpd-based subprocess, and having its handle_DATA() hook throw the received envelopes into a pipeline, where they get pickled then base64'ed and passed via subproc stdout to the parent process.

I truly hope there's a better way. Can anyone please advise a more satisfactory remedy or workaround?

waynew commented 3 years ago

That definitely sounds like a gnarly bug (unless it's just a tech limitation) and one that sounds like it should be easy to replicate. I would think that it would be possible to shrink the amount of data that's read on every cycle. For instance, if you read 10K at a time, it should be possible to yield to another coroutine. I'm not sure if there's a way to tweak the max reader count or if that matters.

All that is theories without actually looking at the code in question, and it having been more than a few days 🙃

davidmcnabnz commented 3 years ago

Believe me, this issue is easy to replicate. Just create an SMTP client which builds a large email full of Lorem Ipsum or similar, and have it generate at least 10 messages of at least 10MB each, then send them simultaneously to the machine/port where the aiosmtpd server is running.

You can easily verify the freeze-up by adding to whatever thread and event loop instantiated and started the Controller, a task which repeatedly calls await asyncio.wait(0.1) then print()s a line to stdout. When the emails are being injected, you can notice this repetitive print() task pauses for several seconds.

Anyway, I've overcome the issue by moving my aiosmtpd Controller and handler object into a separate subprocess where it can bogart the GIL to its heart's content, and not disrupt the parent process.

This implementation is based on having my parent process launch the aiosmtpd server via multiprocessing.Process(...).start(). Then, creating inbound and outbound multiprocessing.Queue objects to facilitate fast data exchange to/from the parent process.

With these inter-process queues in place, my server's handle_DATA() hook passes each incoming message up to the parent process. The parent then spawns an asyncio task to handle the message and return an SMTP-compliant DATA reply string. This reply is then passed back down through the inter-process Queue to the SMTP server process, where it is picked up by the handle_DATA(), where it is then finally passed back to the injecting remote client.

I've now got this working to proof of concept level, and tests with dozens of huge emails show no event loop disruptions in the parent process. After the weekend, tidy it up, integrate it and push it into production. This is a huge relief, because I rely heavily on aiosmtpd in my work, and server issues (many caused partly or fully by aiosmtpd) have often caused me to be woken up all hours of the night.

I suggest the best resolution of this ticket might be to update the documentation to disclose this process-freezing behaviour, and recommend a similar approach of sequestering aiosmtpd-based SMTP servers off into a separate (sub)process.

davidmcnabnz commented 3 years ago

Pressure tests have confirmed that moving aiosmtpd and its Controller object into a sub-process (as opposed to just a separate thread) has reduced the main process' freeze-ups by at least 90-95%.

I'll do some more profiling today to find out whether the remaining freeze-ups are caused by the multiprocessing.Queue traffic, or unrelated causes.

As a bottom line, I'd suggest there might be merit in changing aiosmtpd's default usage pattern from "separate thread" to "separate process". I know this is a strong statement, but I believe that in its present "separate thread" model, aiosmtpd is unsafe in a production environment.

I think that remedying the process-blocking behaviour may be non-trivial since the underlying problem lies in the asyncio.StreamReader class which aiosmtpd relies upon. aiosmtpd correctly awaits lines from the stream reader, which itself correctly awaits fragments from the inbound buffer. With this, there should be no blockage. So it appears something deeper within Python's asyncio internals could be the problem.

pepoluan commented 1 year ago

In my point of view, if users of aiosmtpd want to implement multiprocessing, they can do that on their own. We do not provide a complete package that supports multiprocessing, becase therein lies some even harder to trace problems.

Rather, instead of implementing multiprocessing in aiosmtpd, we can add some "tips" in documentation warning of possible blockage, and recommend if aiosmtpd is going to be used in production where oversized emails are being received, the implementor should run aiosmptd within a separate process.

davidmcnabnz commented 1 year ago

As it happens, moving all SMTP listening to a separate process has been a very effective remedy. Nonetheless, the situation does warrant some warnings in the documentation and method/module docstrings.

Apart from that, aiosmtpd is performing well in high-load production.

pepoluan commented 1 year ago

Good points. I'll see what I can do with documentation/docstring while preparing 1.4.3 for packaging