Open davidmcnabnz opened 3 years ago
That definitely sounds like a gnarly bug (unless it's just a tech limitation) and one that sounds like it should be easy to replicate. I would think that it would be possible to shrink the amount of data that's read on every cycle. For instance, if you read 10K at a time, it should be possible to yield to another coroutine. I'm not sure if there's a way to tweak the max reader count or if that matters.
All that is theories without actually looking at the code in question, and it having been more than a few days 🙃
Believe me, this issue is easy to replicate. Just create an SMTP client which builds a large email full of Lorem Ipsum or similar, and have it generate at least 10 messages of at least 10MB each, then send them simultaneously to the machine/port where the aiosmtpd
server is running.
You can easily verify the freeze-up by adding to whatever thread and event loop instantiated and started the Controller
, a task which repeatedly calls await asyncio.wait(0.1)
then print()s
a line to stdout. When the emails are being injected, you can notice this repetitive print()
task pauses for several seconds.
Anyway, I've overcome the issue by moving my aiosmtpd
Controller
and handler object into a separate subprocess where it can bogart the GIL to its heart's content, and not disrupt the parent process.
This implementation is based on having my parent process launch the aiosmtpd
server via multiprocessing.Process(...).start()
. Then, creating inbound and outbound multiprocessing.Queue
objects to facilitate fast data exchange to/from the parent process.
With these inter-process queues in place, my server's handle_DATA()
hook passes each incoming message up to the parent process. The parent then spawns an asyncio
task to handle the message and return an SMTP-compliant DATA reply string. This reply is then passed back down through the inter-process Queue
to the SMTP server process, where it is picked up by the handle_DATA()
, where it is then finally passed back to the injecting remote client.
I've now got this working to proof of concept level, and tests with dozens of huge emails show no event loop disruptions in the parent process. After the weekend, tidy it up, integrate it and push it into production. This is a huge relief, because I rely heavily on aiosmtpd
in my work, and server issues (many caused partly or fully by aiosmtpd
) have often caused me to be woken up all hours of the night.
I suggest the best resolution of this ticket might be to update the documentation to disclose this process-freezing behaviour, and recommend a similar approach of sequestering aiosmtpd
-based SMTP servers off into a separate (sub)process.
Pressure tests have confirmed that moving aiosmtpd
and its Controller
object into a sub-process (as opposed to just a separate thread) has reduced the main process' freeze-ups by at least 90-95%.
I'll do some more profiling today to find out whether the remaining freeze-ups are caused by the multiprocessing.Queue
traffic, or unrelated causes.
As a bottom line, I'd suggest there might be merit in changing aiosmtpd's default usage pattern from "separate thread" to "separate process". I know this is a strong statement, but I believe that in its present "separate thread" model, aiosmtpd is unsafe in a production environment.
I think that remedying the process-blocking behaviour may be non-trivial since the underlying problem lies in the asyncio.StreamReader
class which aiosmtpd relies upon. aiosmtpd correctly awaits lines from the stream reader, which itself correctly awaits fragments from the inbound buffer. With this, there should be no blockage. So it appears something deeper within Python's asyncio
internals could be the problem.
In my point of view, if users of aiosmtpd want to implement multiprocessing, they can do that on their own. We do not provide a complete package that supports multiprocessing, becase therein lies some even harder to trace problems.
Rather, instead of implementing multiprocessing in aiosmtpd, we can add some "tips" in documentation warning of possible blockage, and recommend if aiosmtpd is going to be used in production where oversized emails are being received, the implementor should run aiosmptd within a separate process.
As it happens, moving all SMTP listening to a separate process has been a very effective remedy. Nonetheless, the situation does warrant some warnings in the documentation and method/module docstrings.
Apart from that, aiosmtpd is performing well in high-load production.
Good points. I'll see what I can do with documentation/docstring while preparing 1.4.3 for packaging
I'd rate this as a critical bug, unless a workaround has already been found (didn't see anything in my reading of the docs),
My
asyncio
process instantiates aController
, then starts it. In the separate thread started by theController
object,aiosmtpd
opens an inbound port and listens for sessions. So far so good, all pretty normal.But at times when multiple external clients are simultaneously injecting large-ish (>5MB) messages, the entire python process -- all event loops, all threads -- grinds to a complete halt.
My tests have been to get an external machine to simultaneously inject 10 messages, of 20MB each, into my SMTPD port. I provide my
Controller
object with a null stubhandle_DATA()
hook, which simply returns"250 Accepted"
.Every time I run this test, the entire process freezes for between 10.5 and 18 seconds, until the injections are complete.
I'm guessing this is a GIL issue. My "main" thread and
asyncio
event loop freezes, along with everything running within that process.I'm considering a radical option of spawning a separate
aiosmtpd
-based subprocess, and having itshandle_DATA()
hook throw the received envelopes into a pipeline, where they get pickled then base64'ed and passed via subproc stdout to the parent process.I truly hope there's a better way. Can anyone please advise a more satisfactory remedy or workaround?