RealRaven2000 / FiltaQuilla

Adds many new mail filter actions to Thunderbird
http://quickfilters.quickfolders.org/filtaquilla.html
GNU General Public License v3.0
88 stars 17 forks source link

Can't filter Regex on `Received` headers. `Received` headers are not accessible from a search term. #181

Open thepixy opened 2 years ago

thepixy commented 2 years ago

Thunderbird 102.3.2 Example Message Header:

Received: from mail.cutenouon.link ([45.88.138.48]:40894) by az1-ss33.a2hosting.com with esmtp (Exim 4.95) (envelope-from copper_zen_socks-randy=pixyland.org@cutenouon.link) id 1olD6O-0000Tr-38 for randy@pixyland.org; Wed, 19 Oct 2022 10:43:37 -0700

Desired behavior: Act of any IP in the range of 44.88.138.0 - 44.88.138.255

Attempted filters:

Header Regex Match: "matches" Received:45.88.138.([1-9]?\d|[12]\d\d) Header Regex Match: "matches" Received:/45.88.138.([1-9]?\d|[12]\d\d)/ Header Regex Match: "matches" Received:^45.88.138.([1-9]?\d|[12]\d\d)$

Result: The regex strings were tested and worked on various regex test sites, but none of the above filter variations find matches in the IP address in the example header I posted with filtaquilla (Version is 3.6). Note: The test site you advised in the install notes (https://regex101.com/) does not have the "javascript" option you recommended, so maybe I'm missing something.

NOTE: being able to filter based on IP ranges in the "Received" headers of my emails is the only thing I'd hoped to gain from filtaquilla. Thanks for any help. I promise an immediate donation if it is possible to do it. Thanks for any help.

fade2gray commented 2 years ago

You are not escaping the periods between the octets.

/45.88.138.([1-9]?\d|[12]\d\d)/ should be /45\.88\.138\.([1-9]?\d|[12]\d\d)/

fade2gray commented 2 years ago

This works in regex101 45\.88\.138\.([01]?\d\d?|2[0-4]\d|25[0-5])\b https://regex101.com/r/ZsGKvs/1

thepixy commented 2 years ago

Thanks. But my version worked in regex 101 too, and your variation didn't work any better in Filtaquilla for me. I tried yours as:

Received:/45\88\138([1-9]?/d|[12]\d\d)/ and also with the escapes... Received:/45.88.138.([1-9]?\d|[12]\d\d)/ also tried... Received:45.88.138.([01]?\d\d?|2[0-4]\d|25[0-5])\b

The filter is never triggered. :-(

RealRaven2000 commented 2 years ago

Result: The regex strings were tested and worked on various regex test sites, but none of the above filter variations find matches in the IP address in the example header I posted with filtaquilla (Version is 3.6). Note: The test site you advised in the install notes (https://regex101.com/) does not have the "javascript" option you recommended, so maybe I'm missing something.

You select "ECMAScript" (which is the same as JavaScript) - is the IP address always included in the Received header?

RealRaven2000 commented 2 years ago

Testing that stuff now - it seems a lot of the headers do not exist as string properties anymore:

image

RealRaven2000 commented 2 years ago

It seems there is a lot of stuff missing in Thunderbird 102 when the database message header is passed to the match method. Here is what is available with an email

[...aMsgHdr.propertyEnumerator];
Array(28) [ "flags", "sender", "recipients", "bccList", "subject", "message-id", "dateReceived", "date", "X-GM-MSGID", "X-GM-THRID", … ]
0: "flags"
1: "sender"
​2: "recipients"
​3: "bccList"
​4: "subject"
​5: "message-id"
​6: "dateReceived"
​7: "date"
​8: "X-GM-MSGID"
​9: "X-GM-THRID"
​10: "X-GM-LABELS"
​11: "junkscore"
​12: "junkscoreorigin"
​13: "junkpercent"
​14: "recipient_names"
​15: "gloda-dirty"
​16: "gloda-id"
​17: "sender_name"
​18: "offlineMsgSize"
​19: "msgOffset"
​20: "storeToken"
​21: "priority"
​22: "size"
​23: "keywords"
​24: "threadParent"
​25: "msgThreadId"
​26: "ProtoThreadFlags"
​27: "label"
​length: 28
thepixy commented 2 years ago

@RealRaven: 1) OK thanks. From now on I'll use the EMCAScript variation. Retesting with that, both these variations work with regex101 45.88.138.([1-9]?\d|[12]\d\d) and... 45.88.138.([01]?\d\d?|2[0-4]\d|25[0-5])\b

So using the second regex string above, I assume that all I need to do in filtaquilla is specify...

HeaderMatchRegex --- matcjhes --- Received:/45.88.138.([01]?\d\d?|2[0-4]\d|25[0-5])\b/

It doesn't work, and I tried it with and without the "/" delimiters

2) Yes... the IP address is always available in the "Received" header. I see them by selecting View>Headers>All. The example I gave was from an actual email. This is why I use them to detect SPAM.

3) Regarding the "database", Even without using FiltaQuilla, I am able (with in the example shown) to create a filter selecting "Received" as the item to search, "contains" as the condition, and 45.88.138. That would always capture all emails with any IP in the range 0f 45.88.38.0 through 45.88.138.25, as is the received header in my example:

Received: from mail.cutenouon.link ([45.88.138.48]:40894) by az1-ss33.a2hosting.com with esmtp (Exim 4.95) (envelope-from [copper_zen_socks-randy=pixyland.org@cutenouon.link](mailto:copper_zen_socks-randy=pixyland.org@cutenouon.link)) id 1olD6O-0000Tr-38 for [randy@pixyland.org](mailto:randy@pixyland.org); Wed, 19 Oct 2022 10:43:37 -0700

So to me this proves "Received" is one of the header items in the emails I get, When viewed using Thunderbird. As I explained, if all I ever needed was to filter all emails with the same first three octets, I wouldn't need to use regex in my filters. But that isn't always the case, so I was hoping filtaquilla would solve that problem.

RealRaven2000 commented 2 years ago

@realraven:

  1. Yes... the IP address is always available in the "Received" header. I see them by selecting View>Headers>All. The example I gave was from an actual email. This is why I use them to detect SPAM.

The field "Received" is not exposed in the data (enumerable properties of nsIMsgHdr). I had an async method of streaming the message in order to access all headers in my Add-on SmartTemplates -see the function getHeadersAsync() there - but asynchronous methods cannot be called by the synchronous match() function that Thunderbird's search engine uses on the underlying C++ layer - that's the part Add-ons cannot change because they are compiled within the binary code of Thunderbird.

Mozilla would have to rewrite their whole filtering mechanism to work asynchronously; until then the "received" header is sadly out of reach during filtering.

RealRaven2000 commented 2 years ago

@thepixy see my (edited) comment above. I will have to ask John Bieling on Monday whether he can think of a different way. If we want to call an asynchronous function and wait for the result we can use the await keyword - but this is only allowed in a code block that is in an asynchronous method itself, and Thunderbird doesn't read / expect filter.match() to be asynchronous, so this would break the whole flow of filtering.

thepixy commented 2 years ago

Well thanks for looking. I'm not quite sure I understand how or why some parts of a message would be receive asynchronously as opposed to synchronous. I can mention some things that may or may not be helpful.

1) as have a few websites that all use the CPANEL interface, I am able to create email filters there too, and regex is allowed. And there too, "Received" is not one of the choices to filter on, but I can choose to filter using "any header" for a search option, and "matches regex" for the search criteria. I have long lists of filters there that do work, and a typical one looks like this...

*from [^[][69.94.(1(2[8-9]|[3-5][0-9])).([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))**

The above is based on my noting that the IPs always seem to start with 'from', and a bracket '[' symbol somewhere down the string, such as this one from you in one of my 'received' categories...

from out-27.smtp.github.com ([192.30.252.210]:35123) by az1-ss33.a2hosting.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from noreply@github.com) id 1olXpy-0003rP-MP for business@pixyland.org; Thu, 20 Oct 2022 08:52:03 -0700

Now my regex string has nothing to do with YOUR message header! I just clipped on from my CPANEL filters (and its probably obvious I'm not very good at finding regex easier shortcuts. I'd usually start with one of those online IP range regex builder websites for the part of the string that captures the IP range, and then made additions till I got consistent reliable results. But I think maybe the takeaway is that when I look at a listing of a "forwarded message" (with headers) in Thunderbird, "from" is always the topmost header, and "from" always precedes every instance of what Thunderbird is labeling "Received". If I remember correctly, my intent in my CPANEL filters was to start with "from" and basically look through everything until I found a matching IP address surrounded by "[" brackets.

Now I will say that in filtaquille I have tried using "from:" as the "header regex match" delimiter, since filtaquilla doesn't offer a "match any header" option. And, since filtaQuill (I guess) requires a "non regex" portion to tell what header to look for, its a little unclear if I could use a similar approach. If so, I have not yet found a winning combination. Maybe you might?

Why Thunderbird lets me filter using this apparently internally generated "Received" tag is beyond my understanding, and as I've observed "Received" is not a option in my CPANEL. But hopfully my example of a successful CPANEL filter (the one including the "from" might help you folks find a way to incorporate this very valuable filtering option.

I will mention that being able to filter to eliminate ranges of IP addresses is a major help in blocking SPAM, and better than anything I've tried. Its just inconvenient to have to do it through my CPANEL, and most people don't have CPANEL access to their ISP provided email address. But the reason its so valuable is this: Take the "45.88.138.48" I mentioned in the beginning. When I get a SPAM email like this, I'll usually go to a convenient "whois" site like MYIP.MS. There I plug in the IP, and discover that the range from 45.88.138.0 - 45.88.138.255 is somewhere in the Ukraine. Well I know I'm not interested in anything from that region, so blocking the whole range blocks lots of SPAM.

Thanks for any help.

RealRaven2000 commented 2 years ago

ok, so the problem is not the regex. the problem is that received doesn't exist in the msg database header, which is the only piece of information passed on to the filter search method. but maybe there is a way of retrieving that information with a synchronous function, I will ask John on Monday.