2called-chaos / bouncefetch

Bouncefetch is a Ruby CLI application which searches an IMAP account for bounce mails, categorizes them and maintains a list of failed recipients which you can export to do whatever you want (unsubscribe, require reconfirmation, etc.).
MIT License
10 stars 1 forks source link

How to write rules? #1

Open ChristianBeer opened 8 years ago

ChristianBeer commented 8 years ago

Your tool seems to be the thing I need. Unfortunately it ignores all the bounces I need it to parse. How can I write new rules if it just won't let me inspect what it sees?

I thought this is a no-brainer for the tool:

Message from yahoo.de.
Unable to deliver message to the following address(es).

<username@yahoo.de>:
This user doesn't have a yahoo.de account (username@yahoo.de) [0]

--- Original message follows.

The original message is over 5K. Message truncated.

But this just gets ignored. I tried other mails but they also get ignored. I really want to write a rule for this. For debugging reasons it would be nice to disable all ignore rules and inspect every mail to adjust rules.

$ ./bouncefetch -d
[00:00:00.000 INFO]     Loading config and rules... DONE
[00:00:00.002 INFO]     Connecting to IMAP server... DONE
[00:00:00.452 INFO]     Selecting 1/1 INBOX... OK
.%X%X%X%
[00:00:02.274 INFO]     All finished!
[00:00:02.274 INFO]
[00:00:02.274 INFO]     4 mails checked
[00:00:02.275 INFO]     4 deleted mails
[00:00:02.275 INFO]     0 handled soft bounces
[00:00:02.275 INFO]     0 handled hard bounces
[00:00:02.275 INFO]     4 ignored mails
[00:00:02.275 INFO]     0 unidentifyable bounces
[00:00:02.275 INFO]     0 no crosscheck matched
[00:00:02.275 INFO]     0 unhandled mails

This tool in general looks very promising for what I need.

2called-chaos commented 8 years ago

I guess the problem here is that the bounces are getting identified but no reference to the client could be found. At the moment we only rely on either the custom header or the X-Failed-Recipients header which unfortunately is nowhere near to be considered "the standard" (as bounces in general).

Although it is pretty straight forward to parse out the email address in this example it isn't in most of the cases. Do you have ideas on how to handle that better? I thought about adding the possibility to allow rules to parse out the recipient address but it would make it a lot more cumbersome to define those.

Since we use the custom header approach (and we found that this works very well, almost all bounces can be identified this way) we didn't spend that much energy in trying to somehow parse the addresses out of the mail body. We might have but you almost always need per-rule specific regex since the bounces are so different.

When I created this tool I had little amount of samples (hence the small ruleset) but I now have yet to grasp over 70k bounces from our system alone. I also have a big list of new rules I have yet to sort out since some are very specific to us. Maybe I get some good ideas while doing that :)

Cheers

PS: If you want I can add an option to also inspect matched bounces with no candidate (client reference) found.

ChristianBeer commented 8 years ago

The problem is that I have a lot of bounces already received that I want to sort. They all don't have a custom header and I would think that a minority has the X-Failed-Recipients header. So I'm looking for a tool that can parse the message body. My first try with a perl module looked promising but it fails to extract emails from bounces with multiple emails in the body.

From what I can tell of the small sample I used for testing is that they all seem to use a familiar style. The majority of those is generated by my own mailserver.

The structure seems to be:

<user@email.com>: reasontext

<user2@email.com>: reasontext

Where reasontext would be the thing to parse (if such a structure is detected) to get the reason. The reasontext can start with or contain newlines and there is at least one mailserver who ommits the <>: characters around the email.

Can you add body-parsing support so that bouncefetch can detect such blocks? I would then use my mails to incrementally create rules for the reasontext part.

2called-chaos commented 8 years ago

Are you thinking about separate rules for the body-parsing or using existing ones and give them the ability to provide an email address?

Since most rules are currently just sub-string testing I have no idea what the best way to define them would be. I guess in most cases the reason text is somewhat combined with the email address?

I saw bounces how you described them but I also saw things like rule(/user <(.+)> unknown/i) where the reason and target email is the same sentence. Either I have redundancy in the rules or the definition becomes more cumbersome (e.g. block syntax with detect and extract directives or something like that)

ChristianBeer commented 8 years ago

I thought of separate rules for body parsing that is triggered by the subject line. It is really frustrating. All the tools I find are either very old and/or they don't recognize multiple email addresses per bounce message.

The reasontext may contain the email address (but it usually is already in the line directly above), but I also get the output form the remote MTA like that:

<jedi@domain.org>: host mx.junkemailfilter.com[184.105.182.187] said:
    550-REJECTED - spamtext [S=9 - BadHeloNS Slow FNONS BadFromNS DOB-FROM] -
    550-X=pascal H=my.mailserver.com [12.34.56.78]
    550-SN=[apache@my.mailserver.com] T=[jedi@domain.org]
    550-FR=[noreply@mydomain.com] S=[Subjectline
    still Subjectline] (in reply to end of DATA command)

This is usul yahoo reply (this is in one bounce mail):

Message from yahoo.com.
Unable to deliver message to the following address(es).

<user1@yahoo.com>:
Sorry your message to user1@yahoo.com cannot be delivered. This account has been disabled or discontinued [#102].

<user2@yahoo.com>:
This user doesn't have a yahoo.com account (user2@yahoo.com) [0]

--- Original message follows.
2called-chaos commented 8 years ago

First of all: Yeah it's frustrating :) Bounces are not my favorite thing to handle due to their vast variety.

Is there anything special about how you send the mails? Because it seems strange to me to get "combined" bounces and I have never seen them so far. But we also don't use CC/BCC so maybe that's why?

When you say "triggered by the subject line" you literally mean the mail subject? I haven't looked for subject/reason format correlations but I haven't seen that much variation in subjects anyway. Most commonly it's one of those:

I also have to cope with the fact that bouncefetch wasn't designed for "combined bounces" aka. multiple addresses in one bounce. Now I actually understand what you mean with tools not being able to handle multiple addresses (my experience was that tools just regex'd any mail adress and normally end up with multiple junk ones). With this problem the custom header wouldn't work anyway, at least at the moment when bouncefetch only expects one case per bounce.

I'll fiddle a bit and see what I can come up with :) I'll keep you updated. And if you have any more ideas: please let me know ;)

And have a nice weekend!

ChristianBeer commented 8 years ago

I have two use cases. The first is using BCC with batches of 20 to send out mail (phpBB). The other is sending out single mails using TO. I didn't yet look at the bounces from the second case which now seem to be easier to manage since they should have only one email per bounce.

My idea was to check for the special header and if that is not there check the subject if it is a known delivery status and then look for the blocks of email + reasontext in the body.

I found another tool that does scan the body of the bounce (PHPMailer-BMH). There are quite a lot forks on github but it seems to do the job at least for the single bounces. The most "combined bounces" I get from my own mailserver where it can't find a mailserver to send to. These are also multiple mails and reasons per one bounce.