Open SystemParadox opened 2 years ago
can you provide a few more details? specifically:
This is what our email manager object looks like:
import type { SMTPConnectionOptions } from 'emailjs';
import { Message, SMTPClient } from 'emailjs';
import PQueue from 'p-queue';
export type AddressList = {
from: string;
to: string[];
};
/** Internal configuration used by emailjs to send mail, see https://www.npmjs.com/package/emailjs for all options. */
export type EmailConnectionConfig = Partial<SMTPConnectionOptions>;
export default class Emailer {
/** stop sending mail after this many items are in the queue */
private readonly mailQueueMaxSize = 512;
/** The minimum amount of time between sending each email. */
private readonly globalMailThrottle = 750;
// throttle mail sending using a promise queue
private readonly mailQueue = new PQueue({ concurrency: 1, interval: this.globalMailThrottle, intervalCap: 1 });
private client: SMTPClient;
constructor(options: EmailConnectionConfig) {
this.client = new SMTPClient(options);
}
public async send(addresses: AddressList, subject: string, text: string) {
if (!addresses) {
console.warn(`Not sending email. Address list looks falsey.`);
return;
}
const from = addresses.from;
if (!from) {
console.warn(`Not sending email. No from address is configured.`);
return;
}
if (addresses.to.length === 0) {
console.warn(`Not sending email. No addresses are configured to send.`);
return;
}
const to = addresses.to.join(', ');
if (this.mailQueue.size >= this.mailQueueMaxSize) {
console.error(
`Dropping outgoing email. There are currently too many items in the email queue. (Queue size= ${this.mailQueue.size}, maximum=${this.mailQueueMaxSize})`
);
return;
}
await new Promise<void>(resolve => {
const mailTask = async () => {
try {
console.log(`Sending email to ${to}...`);
await this.client.sendAsync(
new Message({
from,
to,
subject,
text,
'content-type': text.startsWith('<!DOCTYPE html') ? 'text/html; charset="UTF-8"' : undefined,
})
);
console.log(`Sending email to ${to} [SENT]`);
console.log(`There are`, this.mailQueue.size, 'items left in the mail queue');
resolve();
} catch (err) {
console.error('Error sending email!', `(Subject: "${subject}")`);
console.error(err);
console.log('Putting this email to the back of the send queue...');
void this.mailQueue.add(mailTask);
}
};
void this.mailQueue.add(mailTask);
});
}
}
Stack trace for the two errors:
12:32:59 [ERROR] Error: timedout while connecting to smtp server
at Function.create (/app/node_modules/emailjs/rollup/email.cjs:1077:21)
at Socket.timedout (/app/node_modules/emailjs/rollup/email.cjs:1114:47)
at Object.onceWrapper (node:events:639:28)
at Socket.emit (node:events:520:28)
at Socket._events.default.emit (/app/lib/server/patchEmitter.js:14:18)
at Socket._onTimeout (node:net:501:8)
at listOnTimeout (node:internal/timers:559:17)
at processTimers (node:internal/timers:502:7)
12:32:59 [ERROR] Error: bad response on command 'RCPT': -relay not permitted. Please authenticate using your AAISP mailbox, or send
at Function.create (/app/node_modules/emailjs/rollup/email.cjs:1077:21)
at response (/app/node_modules/emailjs/rollup/email.cjs:1426:48)
at caller (/app/node_modules/emailjs/rollup/email.cjs:1192:9)
at TLSSocket.<anonymous> (/app/node_modules/emailjs/rollup/email.cjs:1381:21)
at Object.onceWrapper (node:events:640:26)
at TLSSocket.emit (node:events:520:28)
at TLSSocket._events.default.emit (/app/lib/server/patchEmitter.js:14:18)
at notify (/app/node_modules/emailjs/rollup/email.cjs:1105:24)
at TLSSocket.watch (/app/node_modules/emailjs/rollup/email.cjs:1119:17)
at TLSSocket.emit (node:events:520:28)
Note that these errors happen several times as described above and our retry system works fine. However, on the last time it logged the Sending email to
message, but it never logged the [SENT]
message nor any error. sendAsync
simply never resolved.
Apparently the SMTP server is accessed through a VPN tunnel - there is no authentication so it relies on having a known static IP through the tunnel. It has been suggested when the flakey mobile connection goes down the relay not permitted
errors occur because it tries to access the server directly over the internet using a dynamic IP instead of using the tunnel. My suspicion is that it's this IP change that causes emailjs to get confused and never resolve/reject the promise.
thanks for the details. they help a lot π
My suspicion is that it's this IP change that causes emailjs to get confused and never resolve/reject the promise.
our connections use node's built-in socket classes, so if that were the case i doubt we could do much. we're kind of at the mercy of the stack in that arena. at any rate, it's still hitting timeout, so the internal state should reset properly. it's very odd to me that it isn't, and suggests to me the internal state is getting corrupted. except...
Note that these errors happen several times as described above and our retry system works fine. However, on the last time it logged the
Sending email to
message, but it never logged the[SENT]
message nor any error.sendAsync
simply never resolved.
...if i understand this correctly, it means that emails are still being sent even after a timeout, and it's only the final request that gets snagged (please correct me if i'm wrong). what's more, you have your queue's concurrency set to 1, so you shouldn't be having a race condition between multiple invocations of sendAsync
. this all strikes me as deeply odd, and i'm not sure where to start digging.
one test i would suggest is creating a new SMTPClient
on every request. obviously that wouldn't fly as an actual solution, but it might tell us something.
Correct, you'll see that if sendAsync
rejects we push back onto the queue so it'll just keep retrying. This all seems to work fine for a while until sendAsync
suddenly doesn't resolve/reject for some reason and so we are left hanging.
We dug around in the emailjs source for a while and it's clear that _senddone
was never called. We are suspicious of the _sendsmtp
callback wrapper which tries to send an smtp RSET
before calling _senddone
. It's hard to follow the trail but we were not able to satisfy ourselves of exactly what happens if there are errors or the connection is closed at this point.
We have enabled the logging in emailjs now so we'll see if that sheds any more light on the situation. However, there doesn't appear to be any logging around the timeout and error handling paths - it would be really helpful if this could be added so we can see what's happening!
Thanks.
Correct, you'll see that if
sendAsync
rejects we push back onto the queue so it'll just keep retrying. This all seems to work fine for a while untilsendAsync
suddenly doesn't resolve/reject for some reason and so we are left hanging.
very odd behavior.
We dug around in the emailjs source for a while and it's clear that
_senddone
was never called. We are suspicious of the_sendsmtp
callback wrapper which tries to send an smtpRSET
before calling_senddone
. It's hard to follow the trail but we were not able to satisfy ourselves of exactly what happens if there are errors or the connection is closed at this point.
the RSET
call is old behavior that predates my involvement with the library, so i'm very wary of altering it π
it does, however, sound like there could be an error in that code path being swallowed, so i think adding some error handling & such would be fruitful. which ties in with...
We have enabled the logging in emailjs now so we'll see if that sheds any more light on the situation. However, there doesn't appear to be any logging around the timeout and error handling paths - it would be really helpful if this could be added so we can see what's happening!
i agree; the instrumentation is very bare bones; i'll see what i can do. do you need this on the 3.x
line, or can we move to 4.x
? the 4.x
line does not have a CJS bundle, but the code is otherwise virtually identical.
Thanks.
thanks for the report!
We haven't managed to migrate over to ESM yet so we'll need this on 3.x
.
Thanks for the help :)
i've managed to repro something that looks like the error you're describing using our test suite. to reproduce, checkout the v3.6.0
tag and apply this diff (it also repros on 4.x
with a slightly different diff). running the test suite should consistently fail on client allows message with only 'cc' recipient header
; skipping the test (via test.skip
) should move the failure to client allows message with only 'bcc' recipient header
, and so forth.
the error is triggered specifically by the client's configuration:
const client = new SMTPClient({ port, tls: true });
const server = new SMTPServer({ secure: true });
can you confirm that your configuration isn't similar?
That looks very similar to our configuration!
We're only using { host: 'x', port: 587, tls: true}
- there is no username/password.
cool, i think we have a target now π i don't know what the fix would be, unfortunately, so it may take some time to figure out. in the meantime, using ssl
/user
/password
as a workaround may be necessary (if possible).
i've been digging through the code on a separate branch & i think this might be a configuration issue. specifically, i set up connection tests (see can connect with tls
and can connect with tls and ssl
) and they both run fine... until i force the server to be in secure
mode, at which point can connect with tls
blows up in the familiar way. this, combined with the Please authenticate using your AAISP mailbox
error message, makes me think you're supposed to be connecting with both tls
and ssl
enabled. the snag then would be if you don't have credentials but the server sends you a "too many unauthenticated requests"-type error.
Thanks for looking into this. Unfortunately I think we may have drifted away from the real issue, which is that the promise sometimes never resolves/rejects - it just gets left hanging. This should never happen, no matter what the client or server configuration is. It should either be resolved succesfully, rejected with an error, or rejected due to timeout.
Thanks for looking into this. Unfortunately I think we may have drifted away from the real issue, which is that the promise sometimes never resolves/rejects - it just gets left hanging. This should never happen, no matter what the client or server configuration is. It should either be resolved succesfully, rejected with an error, or rejected due to timeout.
correct, that's still an issue, but i think it may be a sympton of the configuration issue. i'm still looking into why the promise is stuck pending.
i managed to create a simplified version of your queue code at https://github.com/eleith/emailjs/blob/%C3%B8/test/queue.ts to test for this exact scenario and... it works as it should, rejecting after five tries with a properly closed-out client. the timeouts even came from the client. i'm at a bit of a loss as to how to continue.
The latest update is that this ran fine for just over a week, including handling periods of bad connections and errors as above, but eventually something triggered it again (during a period of poor connectivity). So it seems that it only happens when there are connectivity issues, and even then it's pretty intermittent - most of the time it handles it fine.
With emailjs logging enabled these are the last messages we received from emailjs before it just went silent on us:
mail FROM:<[REDACTED]>
rset
If we're ever going to get a test to reproduce this I think you're going to need some sort of test socket that randomly drops packets, hangs or disconnects. Then have a test that retries continuosly forever and just leave it running until it hangs.
I don't think you need any of our queuing system at all, it's not really relevant.
The latest update is that this ran fine for just over a week, including handling periods of bad connections and errors as above, but eventually something triggered it again (during a period of poor connectivity). So it seems that it only happens when there are connectivity issues, and even then it's pretty intermittent - most of the time it handles it fine.
the best kind of bug!
If we're ever going to get a test to reproduce this I think you're going to need some sort of test socket that randomly drops packets, hangs or disconnects. Then have a test that retries continuosly forever and just leave it running until it hangs.
i'll have to do some research on how to set that up. given the randomness, though, i'm worrried the issue is at the os level. i think looking into logging would be a good next step now to try and clear up what exactly is happening when you get your hang.
We're connecting to an ISP mail server over a flakey mobile connection. We've implemented a retry system but it relies on emailjs resolving.
We recently got the following sequence of error responses:
On the next retry it seems that emailjs never resolved the promise so our retry queue is now blocked and no email is being sent. This was 4 days ago.
client.sending
istrue
client.ready
istrue
client.smtp._state
is2
(CONNECTED
)I have checked
netstat
and there is definitely no active connection.