bbottema / simple-java-mail

Simple API, Complex Emails (Jakarta Mail smtp wrapper)
http://www.simplejavamail.org
Apache License 2.0
1.22k stars 265 forks source link

Send multiple emails per session connection #198

Closed petarov closed 4 years ago

petarov commented 5 years ago

Hi! This is more of a question rather than an issue.

I'm working on an implementation that has to send many emails using one and the same SMTP server. I saw that async sending is already supported which is pretty cool! However, I was looking for a way to send many emails synchronously over an already established TCP connection.

Browsing the MailSender code I stumbled on this part. I think this opens a new connection for each email, right? Does it make sense to improve this, so that it reuses the transport/session to send several messages synchronously?

bbottema commented 5 years ago

Hi @petarov, it could be an improvement, but I'm unaware of any performance issues with creating new connections for emails. Do you have experience with sending emails in bulk such that reusing a connection shows better results? Or is there another reason you wish to reuse a connection?

petarov commented 5 years ago

Thanks for the feedback @bbottema! I'd like to reuse the connection so that that same underlying tcp socket gets used for sending bulk (hundreds to thousands of emails) of data. I'm generally trying to squeeze as much performance as possible for a product I'm working on.

That being said, I don't have any data to compare the performance at the moment, but I'll try to prepare some and come back to you with numbers for a better overview.

bbottema commented 5 years ago

Thanks, it would be very helpful to have some more data. Generally speaking as a rule of thumb I'm very cautious of premature optimization, since that makes the code more complex often without really needing it. But if a real case can be made I'm all for it.

petarov commented 5 years ago

Ok, here are a bunch of, let's call them naΓ―ve, tests using SimpleJavaMail and just plain JavaMail with Transport.sendMessage(). I've used FakeSMTP as an SMTP server to perform the tests. Test email size = 15Kb.

It can be observed in the SMTP protocol handshake that the JavaMail+sendMessage variant does not close the connection, but rather keeps sending emails after EHLO was initially sent. The results from my tests are below. The client code's running on a macOS machine and the SMTP server on a remote Linux server.

The elapsed time results as mean values are irrelevant, but the difference does seem to be around 2x in favour of JavaMail+sendMessage.

Method Emails Count Elapsed Time
SimpleJavaMail 100 40,71 seconds
JavaMail+sendMessage 100 21,11 seconds
SimpleJavaMail 1000 404.71 seconds (6m 45s)
JavaMail+sendMessage 1000 198.70 seconds (3m 18s)
SimpleJavaMail 10,000 3921,03 seconds (1h 5m 22s)
JavaMail+sendMessage 10,000 1954,81 seconds (32m 35s)

Of course, I might have missed something, so here's the repo, if you'd like to take a look. (Warning! Code's messy.)

And finally, I actually think that 2x is not that much of an issue for my use case (~500 max), so this issue could be closed. I can live with SimpleJavaMail working as it is at the moment. πŸ˜‰

bbottema commented 5 years ago

Can you please add a third approach of JavaMail sendMessage without reusing the connection? I'm curious to know how much overhead Simple Java Mail adds.

petarov commented 5 years ago

Sure, so I tested 5 variants.

Variant 1 - SimpleJavaMail. (Default Mailer, no threading parameters.) Variant 2 - SimpleJavaMail with 10 threads. Async send. Variant 3 - Transport.send()

Transport.send(msg);

Variant 4 - Transport+getInstance()+connect()+sendMessage().

        try (Transport t = session.getTransport()) {
            t.connect();
            t.sendMessage(msg, msg.getAllRecipients());
        } finally {
            // do nothing
        }

Variant 5 - transportInstance.sendMessage(). Same as JavaMail+sendMessage in my tests above. Transport instance is created once and it's used for each email being sent.

        if (!transport.isConnected())
            transport.connect();
        transport.sendMessage(msg, msg.getAllRecipients());

Results

# Method Emails Count Elapsed Time
1 SimpleJavaMail 100 38,64 seconds
2 SimpleJavaMail+10 Threads 100 0,14 seconds
3 Transport.send() 100 38,16 seconds
4 Transport+getInstance()+connect()+sendMessage() 100 38,79 seconds
5 transportInstance.sendMessage() 100 19,22 seconds

No doubt, SimpleJavaMail + threads is the fastest variant, if threading is desired. 😎

Last edit, maybe it's worth adding that Variant 5 should be used with caution in multi-threaded scenarios as stated in this stackoverflow comment.

bbottema commented 5 years ago

Excellent work, thank you!

In conclusion, Simple Java Mail performance is pretty good in parallel mode, but can be improved even further roughly increasing performance by a factor of two when reusing Transport connections from a pool.

Interesting though that establishing a connection to the smtp server takes about the same time as actually sending an email to it!

I'll leave this open after all as a low-priority enhancement. Thanks again!

bbottema commented 5 years ago

Hi @petarov, I've been re-implementing how threads are handled for #207. Any change you could run your tests on branch 207-configurable-threadpoolexecutor to see if the performance didn't degrade somewhere?

petarov commented 5 years ago

Hi @bbottema, Sounds great! I'll try to do that this weekend and let you know.

petarov commented 5 years ago

Ok, I had a few problems adjusting the dependencies, but here we go. I did several runs per email count test and I believe there are no deviations from what the tests showed in version 5.1.

# Method Emails Count Elapsed Time
1 SimpleJavaMail RUN 1 100 38,64 seconds
2 SimpleJavaMail RUN 2 100 39,37 seconds
3 SimpleJavaMail RUN 3 100 40,71 seconds
4 SimpleJavaMail+10 Threads RUN 1 100 0,25 seconds
5 SimpleJavaMail+10 Threads RUN 2 100 0,30 seconds
6 SimpleJavaMail+10 Threads RUN 2 100 0,26 seconds
bbottema commented 5 years ago

Excellent, thank you so much. Yeah, the dependencies changed a little bit,sorry about that ':)

bbottema commented 5 years ago

I've been looking into reusing Transport connections, but I'm not yet happy with my options. Basically what we need is a generic object pool for reusing creation-expensive objects. There are a bunch of those, but I rather not resort to a 3rd party library for this.

Got any ideas?

/edit: Maybe I'll define a new module for batch processing. In that case, going by matureness, community, complexity and code quality, Stormpot would be the most likely candidate.

bbottema commented 5 years ago

One snag is that for each email, there's a chance the session should be mutated (to configure bounceToEmail). A session is generally reusable, but not in that case.

Maybe I need to find a way to set this property when actually sending it, so that it doesn't end up in the session instance at all...

/edit: hmm, it seems SMTPMessage a subclass of MimeMessage can be used to configure the "mail.smtp.from" property as well.

petarov commented 5 years ago

Good question @bbottema. Could you elaborate on reusing the Transport connections part? Are they really a good candidate for pooling? I see this more as a single continuous batch operation, so if you mean a new module with implementation dedicated only to a sending emails in a single-threaded fashion, that sounds to me pretty cool already.

As for an object pooling solution, I'm mostly using Google Guava's Cache in my projects, although from the synopsis on Stormpot's performance it does look somewhat more advanced than Guava to me. As far as I could glance in BlazePool's implementation, there're no synchrnoized code blocks on read/write (EDIT: ok, just noticed it uses BlockingQueue). Guava's Cache is backed by a ConcurrentMap, so that might be an overkill for single-threaded batch processing.

Strangely enough, I think the QueuePool got deleted from Stormpot, but I couldn't find out why. πŸ™‚

One snag is that for each email, there's a change the session should be mutated (to configure bounceToEmail). A session is generally reusable, but not in that case.

Good point. If possible, I'd personally just discard that feature at first, until someone explicitly say they need it.

bbottema commented 5 years ago

Actually, I found a way around that limitation, by using an SMTPMessage instead of a MimeMessage, which is a subclass of MimeMessage and overrides bounceTo (envelope-from) of the Session (so we can just leave that empty now).

I'm reluctant to rely on a large general-purpose library just for its caching support, so that's why I didn't go with Guava. I'm now on my way with a first implementation though using Stormpot and will start testing soon!

From Stormpot's documentation, there should still be a QueuePool implementation next to BlazePool. Anyway, I have to stick with 2.4.1, because 3.0.0 will require Java 8+, which I'm not ready to have as a requirement for Simple Java Mail.

I see this more as a single continuous batch operation, so if you mean a new module with implementation dedicated only to a sending emails in a single-threaded fashion, that sounds to me pretty cool already.

Precisely. And if you keep the time-to-live after last-access-timestamp to a few seconds, that will achieve the same effect as a continuous batch (but it will be completely configurable). Stormpot, like Guava, is very easy to configure this way.

I do intend to allow for multiple Transport connections though, because that can improve performance significantly if your mailserver can handle it (Transport connections by themselves are blocking).

petarov commented 5 years ago

I do intend to allow for multiple Transport connections though (hence Stormpot), because that can improve performance significantly if your mailserver can handle it

Alright, that sounds pretty good.

Anyway, I have to stick with 2.4.1, because 3.0.0 will require Java 8+, which I'm not ready to have as a requirement for Simple Java Mail.

As someone that has to support commercial Java 1.6 software, I can understand that. However, I'd encourage you to not delay the Java 8 push, if you can help it. Cheers 🍺

bbottema commented 5 years ago

Closed in favor of #214.

bbottema commented 4 years ago

Hi @petarov, I finished the batch module, which now supports everything you can think of.

Without the Batch Module, Simple Java Mail behaves as it did before, with a single Transport connection being created for each email. With the Batch Module included however, it defaults to 4 Transport connections that can be reused and even pre-started (kept warmed up). You can even configure a cluster of servers each with 4 Transport connections or any other number.

You can even define multiple clusters if you plan on using different routes for different types of email (like internal mail and external mail).

It's in the develop branch, so you might wanna give it a spin. I would love to be able to mention some mind blowing statistics in the 6.0.0 release!! The updated website with documentation is now simply statically compile HTML files, here.

bbottema commented 4 years ago

Hey @petarov, advanced and improved batch processing is released in 6.0.0-rc1! Check out the renewed website...

petarov commented 4 years ago

Hi @bbottema, that's great to hear! I hope to get some free time on the weekend to try it out. Hopefully I can also do some tests and upgrade to 6.0.0 till the end of the month. πŸ˜‰

petarov commented 4 years ago

Hi @bbottema

I finally got around to do some tests. It looks pretty good and I'll be integrating version 6.0.1 in my project next week.

Test Parameters

Results

# Method Emails Count Average Elapsed Time
1 SimpleJavaMail with 10 threads 100 4,47 seconds
2 SimpleJavaMail with Connection Pool* 100 16,6 seconds
3 transportInstance.sendMessage() 100 19,22 seconds

* Connection Pool Parameters

I can only say, well done! πŸ‘

bbottema commented 4 years ago

Is that 0,12 seconds correct?

Also, I'm curious if using both multiple threads as well as a connection pool makes any difference.

petarov commented 4 years ago

Hmm, now that you've mentioned it, I've noticed that the threaded version did not send any mails to the SMTP server. There's this exception thrown in AsyncResponse.onException().

java.lang.IllegalStateException: Cluster contains no pools to draw from
        at org.bbottema.clusteredobjectpool.core.ResourceClusters.cycleToNextPool(ResourceClusters.java:182)
        at org.bbottema.clusteredobjectpool.core.ResourceClusters.claimResourceFromCluster(ResourceClusters.java:122)
        at org.simplejavamail.internal.batchsupport.BatchSupport.acquireTransport(BatchSupport.java:112)
        at org.simplejavamail.mailer.internal.util.TransportRunner.runOnSessionTransport(TransportRunner.java:71)
        at org.simplejavamail.mailer.internal.util.TransportRunner.sendMessage(TransportRunner.java:47)
        at org.simplejavamail.mailer.internal.SendMailClosure.executeClosure(SendMailClosure.java:78)
        at org.simplejavamail.mailer.internal.AbstractProxyServerSyncingClosure.run(AbstractProxyServerSyncingClosure.java:56)
        at org.simplejavamail.internal.batchsupport.AsyncOperationHelper$1.run(AsyncOperationHelper.java:74)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

Maybe it's a misunderstanding on my side with the configuration, although I've checked the docs.

        MailerRegularBuilderImpl builder = MailerBuilder
                .withSMTPServer(smtpHost, smtpPort)
                .withTransportStrategy(TransportStrategy.SMTP)
                .withSessionTimeout(10 * 1000)
                .clearEmailAddressCriteria()
                .withDebugLogging(true)
                .withThreadPoolSize(10);

Am I missing something?

bbottema commented 4 years ago

Unless this is a bug, this can only occur if the pool was shut down and then reused. On shut down, the connection pools are discarded, but the cluster key remains. If you then try to reuse the cluster key, this exception is thrown.

Is it possible, you are shutting down the connection pool on this line, before the threads are done sending? That would explain the 0,12 seconds ;)

petarov commented 4 years ago

Yeah, that was a bummer. The test did not wait on the AsyncResponse futures to complete. It seems my older tests were also compromised by this. I totally overlooked it.

It looks much more realistic now at 4,47 sec. I've updated the table above. πŸ˜‰

Still, I'm unsure when/how to call mailer.shutdownConnectionPool(). If I invoke mailer.shutdownConnectionPool().get() when all messages get sent, it just seems to block indefinitely. Any advice on that?

bbottema commented 4 years ago

Seeing your code, I can imagine two methods for shutting down:

Regarding your question on how to use the current one: if your threads are done, it should definitely not block indefinitely. That would be a bug.

One possibility is that, while sending an email in a thread, if it fails then the claimed Transport is not released back into the pool. In this scenario the pool blocks indefinitely because it waits for all resources to free up. Did you verify all mails sent without issue?

Though that might also be considered a bug if a transport is not released if the email was met with an exception (although releasing it back to the pool might mean the pool is now poisoned with a broken resource). Hmm, food for thought.

petarov commented 4 years ago

Did you verify all mails sent without issue?

Yep, they are all being sent successfully. I did a simple 10 emails tests. All received and no exception was reported by AsyncResponse.onException(). I could see all 10 at the SMTP server side as well.

Interestingly enough, if I just create a new mailer and invoke mailer.shutdownConnectionPool().get() without sending any Emails, it also blocks indefinitely. πŸ€”

bbottema commented 4 years ago

Yep it's a bug. The combined Futures of all pools being shutdown are aggregated into a FutureTask which itself is never run (the pools do shutdown, but the future task doesn't perform the wait closure) so it never completes and hence .get() blocks indefinitely. Whoops.

bbottema commented 4 years ago

Actually were are a bunch of issues where executor services were not shut down properly, the aggregate pools shutdown future task not executing (and implemented wrong).

I've fixed all these issues under #246.

bbottema commented 4 years ago

6.0.2 released, all shutting down issues should be gone now!

bbottema commented 4 years ago

@petarov With that out of the way, could you please add a test with both multiple threads and a connection pool? The best would be to run a warmup cycle first and then perform all tests at once so they are guaranteed to run under same parameters.

Also if you would post the system details (cpu, memory etc.) of both the client and server system that would be a great help. This would give a more complete frame of reference and I'll include it on the website as a rough reference performance (awesome!). If you're ok with having this data published that is.

petarov commented 4 years ago

Ok, it all looks good in 6.0.2 so far.

Here's the input you requested, but I'm not sure if this is something to put on the website. I expect the results will definitely very when using a real SMTP server. It's your call. πŸ˜‰

Test Parameters

Results

Test Emails Count # 1 # 2 # 3 Average Elapsed Time
ThreadPoolSize=10 + CPool MaxSize=4 100 4.41 4.56 4.71 4.56 sec.
ThreadPoolSize=10 + CPool MaxSize=1 100 14.86 14.61 14.58 14.68 sec.
ThreadPoolSize=0 + CPool MaxSize=4 100 16.64 16.50 16.91 16.68 sec.
JavaMail Transport 100 19.60 18.82 19.93 19.45 sec.
bbottema commented 4 years ago

Thanks a lot for your effort! Maybe I'm misunderstanding though, because I still miss a test with both multiple threads and a connection pool (as a combo).

Additionally, I don't suppose it's easy for you to set up another server (or two) to run as a cluster? I wonder if we can get those 100 emails down to a single second 😏

petarov commented 4 years ago

Just did an update of my comment above. TBH, I'm a bit unsure of the exact specifics of how the threads and connection pools are working together, but the performance does seem to improve as I increase the number of threads and connection pool max size. So, it is more or less what I would have expected.

A cluster run would be a challenge. Can't promise anything, but perhaps if time permits.

bbottema commented 4 years ago

Yeah, don't feel pressured at all, you did so much already.

bbottema commented 4 years ago

So we ran into this bug, which I think makes all your tests invalid? (crap)