datawookie / emayili

An R package for sending email messages.
https://datawookie.github.io/emayili/
179 stars 27 forks source link

Sending emails becomes very slow after some iterations in a loop. #140

Closed gueyenono closed 2 years ago

gueyenono commented 2 years ago

I would like to start by thanking you for a great and easy-to-use package.

I am currently in a situation similar to what is described in Issue #112. I created a loop (really, I'm using purrr::map()) to send close to 1300 emails with 3 attachments each (2 attachments are the same and 1 attachment is specific to the recipient, hence the loop). At first, everything was running very fast, but after some iterations of the loop (I am not sure how many), the loop has started to slow down significantly.

In Issue #112 you suggested the use of verbose = FALSE in the smtp() function and the original poster concurred that the solution helped a lot. I am definitely going to apply it; however, I was wondering if for this number of email, keeping the connection with server(..., reuse = TRUE) was causing the slowdown. Although I don't know the inner workings of how the package interacts with the email system, I was thinking that keeping the connection open and having that much traffic (~ 1300 emails) could be an issue that closing and reopening the connection could solve.

Thank you.

datawookie commented 2 years ago

Hi @gueyenono!

I'm not sure that reusing a connection is going to solve your issue. I did a quick experiment, sending 300 messages (each with a decent sized attachment) and timing how long it took to deliver to the SMTP server. Below are the results for two cases, either reusing an existing connection or creating a new connection for each message.

image

It looks like reusing a connection is consistently faster. This stands to reason since the alternative requires creating a new connection each time, which incurs some latency.

Here are the summary statistics for both cases.

image

You can see that on average reusing a connection is faster.

You definitely don't want to generate verbose output. This will slow things down a lot and I generally only use this for debugging.

I suspect that your SMTP server might be throttling the frequency with which you are able to send messages. See, for example, rate limits on Exchange Server. Or you are running into a daily limit. While setting up this analysis I burned through my daily limit on two of my email accounts. The symptoms were that I was no longer able to send emails from those accounts at all...

Do you need to send these emails rapidly? If not then I suggest that you build a small delay into the loop for sending out the messages. This might mean that it takes a little longer to send them all out but you will likely not get throttled by the SMTP server.

Hope this helps!

Best regards, Andrew.

gueyenono commented 2 years ago

@datawookie Thank you for this. Setting verbose = FALSE definitely solved my issue. I know from experience that R's console does not handle excessive text very well.

Thank you once again for a wonderful package (and other great packages).

datawookie commented 2 years ago

@gueyenono I know that the RStudio console can become very slow with large volumes of output. Might be different if you run R on the command line. Either way, avoid the verbose output if you don't need it.

gueyenono commented 2 years ago

I will have to try it on the command line. But honestly, I set verbose to TRUE simply because I wanted to have a way to know that "something was happening". My workaround was to add to R code to write to a log file at every iteration of the loop. This enabled me to keep watch over the process.

Thank you very much once again.

gueyenono commented 2 years ago

@datawookie I have an unrelated question. I do quite a bit of web scraping on a regular basis and one good practice is to add a delay between two consecutive HTTP requests in order to avoid the website to lock you out. It is the case for email servers as well. As mentioned earlier, I sent about 1300 emails one after the other and I put a 5 second delay between two consecutive sending of email. Did I just arbitrarily lengthen my processing time or was it a good idea?

Thank you.

datawookie commented 2 years ago

Hi @gueyenono, I believe that this is the right move. Yes, your process will run slower. But you can be more confident that it will run to completion since you are less likely to get throttled (or shut out!) by the server. I'd say that, unless this is a time-critical process (which is unlikely to be the case for sending out emails), then this is definitely the correct thing to do. You're really just being respectful of the server and that other people want to send out emails too. You could probably prune down that 5-second delay to 3 or 2 seconds. But start with 5 seconds and just be sure that the job is running to completion reliably. Then incrementally drop the delay. I hope this helps, Andrew.

gueyenono commented 2 years ago

@datawookie I appreciate the answer. Thank you very much.