anymail / django-anymail

Django email backends and webhooks for Amazon SES, Brevo (Sendinblue), MailerSend, Mailgun, Mailjet, Postmark, Postal, Resend, SendGrid, SparkPost, Unisender Go and more
https://anymail.dev
BSD 3-Clause "New" or "Revised" License
1.7k stars 132 forks source link

Spurious signals received from SES, or bug? #200

Closed zehawki closed 4 years ago

zehawki commented 4 years ago

Hiya, I'm seeing a very strange case after enabling anymail. Since using anymail, we've sent thousands of mails. For one particular recipient domain, as soon as the mail is sent, it immediately shows that the mail has been opened and ALL the links have been clicked. This is simply not possible, its not human behavior. Please see some screenshots below:

anymail 1

anymail 2

It seems to be happening only with this domain and it seems to be happening for the majority of email IDs of this domain. Would you perhaps have some clue as to whats going on. Has anyone ever reporting anything like this with SES or anymail?

The only thing I can think of is perhaps there is a malware service at the recipient which is opening each link to check, and hence triggering SES signals.

Note that after this initial set of signals, there seems to be more human behavior. The email is never opened, or opened once or twice, at best 1 or 2 links are clicked on. This is how its supposed to be for any email.

zehawki commented 4 years ago

it immediately shows that the mail has been opened and ALL the links have been clicked

The latter part is incorrect. Most of the links have been "clicked", not all. And some of the links have been "clicked" multiple times

The only thing I can think of is perhaps there is a malware service at the recipient which is opening each link to check, and hence triggering SES signals.

But that wouldnt explain why some links are opened more than once. An automated malware checker has no reason to do this.

medmunds commented 4 years ago

So anything's possible, but I think it's extremely unlikely this is an Anymail issue, or even an SES problem. Given that, you might get more helpful answers by posting your question in a more general forum, like StackOverflow. (Also, btw, you can search what Anymail users have reported right here on the GitHub issues page—just clear the "is:open" filter to see past issues. To find out what SES users have reported, I usually head to AWS's community forums. I don't work for AWS, and my own production experience with SES is limited to a low-volume personal project that doesn't use tracking.)

In general, open and click tracking is never 100% reliable, with any ESP. Image proxies (e.g., Gmail image proxy), malware and policy compliance scanners, spam filters, link preview generators, mail clients with fancy offline capabilities, and who knows what else might interfere with your tracking in automated ways, causing false negatives and false positives. (Some privacy conscious users deliberately try to pollute ISP tracking by generating a high volume of fake http requests; I wouldn't be surprised to find an email service doing something similar with tracking links.)

I think your theory about a malware scanner is a pretty good guess, but I don't know why it might follow some links and not others. If the destination domain is a large corporation or ISP, your messages might be getting spread across multiple scanner instances. Or the scanner might have bugs.

If you're using cc or bcc in your email, that could be making things worse, because the same tracker is used for all recipients. SES specifically recommends using tracking only for single-recipient messages.

Maybe take a look at the user_agent on the spurious requests to see if it reveals any clues about what is following the links.

Finally, if you're able to contact someone in IT at the problem domain, I'd just ask what they have in their mail pipeline that might cause this.

zehawki commented 4 years ago

Thanks for the very detailed info. This is very useful.

If you're using cc or bcc in your email, that could be making things worse, because the same tracker is used for all recipients. SES specifically recommends using tracking only for single-recipient messages.

Nope, strictly no CC/BCC.

Maybe take a look at the user_agent on the spurious requests to see if it reveals any clues about what is following the links.

Thats a great suggestion. Its worth a try, though user agents have started looking quite similar these days.

I also see that SES sends IP address in the JSON event, but Anymail doesnt supply that in the event param, so I'll add a parsing of the raw esp_event to extract that and see what I find.

medmunds commented 4 years ago

Yeah, based on a quick search you're probably running into an anti-spam filter checking a sampling of links to see if they end up on a spammy site.

If you can't live with these false opens and clicks, you might be able to filter them from your data. Any cluster of clicks within a short time after delivery is probably false. If you want to get more accurate, several articles suggest adding an invisible link as bait for the spam filter; if you get a click on that, you could exclude all other clicks from the same time period and IP address. (Though it sounds like at least some spam filters only check a random subset of links, which seems to match what you're seeing.)

A few articles suggest spam filters are more likely to check links when they're already suspicious about the email. If you haven't yet set up a custom FROM domain and the appropriate SPF and DMARC records, doing that can make your mail seem a lot less sketchy (and will definitely help with deliverability to the major ISPs).

zehawki commented 4 years ago

Any cluster of clicks within a short time after delivery is probably false.

Yup, I've devised such a scheme yesterday and put it in place for this set - I'm using a simple rule to discard any clicks that come before an open. But making it broad-based and fool proof seems to be a bit tough since there are many conditions:

  1. Sometimes this cluster happens within 60s of the mail going out, sometimes even as much as 5 mins later.
  2. This particular email had maybe 7 links in it, so its easy to see the cluster visually. Most of our mails have only 1 link, a CTA, in which case it would be particularly difficult to figure out.
  3. Since many clients block the tracking pixel (including my own, Outlook), I'd also need to handle the case where there is no open signal, but then there are 1 or more clicks.
  4. Then possible because the mail has been forwarded to another recipient in the same domain, there is a cluster, followed by human behavior, then another cluster
  5. ... more variants that I havent discovered yet

adding an invisible link as bait for the spam filter

Ha! Yes, I thought exactly of this yesterday, sort of a reverse honeypot. Oh the irony ;-)

medmunds commented 4 years ago

Sometimes this cluster happens within 60s of the mail going out, sometimes even as much as 5 mins later

Which event are you using as "mail going out"? I only see "queued" in your earlier screenshot, which is when SES accepts the outgoing message from you. Does it make a difference to look at "delivered," which is when SES successfully hands off the message to the receiving domain?

making it broad-based and fool proof seems to be a bit tough

If highly accurate email click tracking is essential to your business, then you're basically entering the arms race to differentiate human clicks from bots. And as far as I know, none of the transactional ESPs give that any consideration in their simple click tracking.

It might be sufficient to implement your own click instrumentation on the destination pages (e.g., use JS to post a click event after a short delay, assuming this spam filter doesn't run JS and wait around on the page). Or you might move email click tracking to whatever site analytics you use (e.g., Google Analytics' utm params), if your analytics has some protection against click spam. Of course, if the clicks have any sort of external value, you'll end up needing a full range of adtech-style click fraud defenses (and email spam filters will be the least of your problems).

Otherwise, it might be simpler to just live with the false clicks. Maybe report click rates as a range? As you've noted, open tracking is unreliable, so you're probably already interpreting open rates with some skepticism.

zehawki commented 4 years ago

Another campaign went out today, and I saw this same behavior with some email IDs. So here's some findings:

  1. Mail recipient # 1 - 6 out of 6 links were clicked and the IP addresses are all different - a range of addresses in South Korea, all belonging to MS. All had the same UA - Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36, so there's prolly nothing to conclude from UA string.

  2. Mail recipient # 2 - 6 out of 6 links were clicked and the IP addresses are all different - a range of addresses in the US, all belonging to AWS. All had the same UA - Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36, so there's prolly nothing to conclude from UA string. In additional there was an open event from Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via ggpht.com GoogleImageProxy), before the clicks. That took me to this reading: https://www.gmass.co/blog/false-opens-in-gmail/

And so and so forth down a rabbit hole. You are quite right with everything you mentioned earlier... this is a bit of a crap shoot anyway, and for sure not a bug etc.

Thank you so much for your time and thoughtful comments back and forth.