RealRaven2000 / FiltaQuilla

Adds many new mail filter actions to Thunderbird
http://quickfilters.quickfolders.org/filtaquilla.html
GNU General Public License v3.0
88 stars 17 forks source link

Body RegExp Match -Filter doesn't work #269

Open V-H opened 4 months ago

V-H commented 4 months ago

TB 115.12.0, FiltaQuilla 4.1

Processing this Filter: e-Mail (fuke-VH)_2024-07-19_15-14.json brings this Error: FiltaQuilla 10:23:7.273 [1359286 ms] filtaquilla-util.js:222:13 Uncaught InternalError: allocation size overflow bodyMimeMatch chrome://filtaquilla/content/filtaquilla-util.js:596 match chrome://filtaquilla/content/filtaquilla.js:1688 runSelectedFilters chrome://messenger/content/FilterListDialog.js:758 oncommand chrome://messenger/content/FilterListDialog.xhtml:1 filtaquilla-util.js:596:43 in Mails like RegEx-Filter_BeispielMailX.eml.txt

RealRaven2000 commented 4 months ago

I am not sure if I have bandwidth this weekend - jsut preparing for a holiday and I am live streaming on Sunday, leaving on Monday, so I may need a reminder when after I return on the 31st!

RealRaven2000 commented 4 months ago

Since you are still on Tb 115, can you check if rolling back to FiltaQuilla 4.0 resolves the issue?

shoneg commented 3 months ago

Hi, I had the same issue. Rollback to FiltaQuilla 4.0 works for now.

V-H commented 3 months ago

Also rollback to 4.0 — works for now.

RealRaven2000 commented 3 months ago

TB 115.12.0, FiltaQuilla 4.1

Processing this Filter: e-Mail (fuke-VH)_2024-07-19_15-14.json brings this Error: FiltaQuilla 10:23:7.273 [1359286 ms] filtaquilla-util.js:222:13 Uncaught InternalError: allocation size overflow bodyMimeMatch chrome://filtaquilla/content/filtaquilla-util.js:596 match chrome://filtaquilla/content/filtaquilla.js:1688 runSelectedFilters chrome://messenger/content/FilterListDialog.js:758 oncommand chrome://messenger/content/FilterListDialog.xhtml:1 filtaquilla-util.js:596:43 in Mails like RegEx-Filter_BeispielMailX.eml.txt

tested with the wrong filter first. the json name mislead me to test a different one. so the name is Fritz: Datenrate (63 kbit Empfangen, 23 kbit Senden) testing with that next now.

Note tof self: I should suggest a better default file name when saving a single filter

RealRaven2000 commented 3 months ago

TB 115.12.0, FiltaQuilla 4.1

Processing this Filter: e-Mail (fuke-VH)_2024-07-19_15-14.json brings this Error: FiltaQuilla 10:23:7.273 [1359286 ms] filtaquilla-util.js:222:13 Uncaught InternalError: allocation size overflow bodyMimeMatch chrome://filtaquilla/content/filtaquilla-util.js:596 match chrome://filtaquilla/content/filtaquilla.js:1688 runSelectedFilters chrome://messenger/content/FilterListDialog.js:758 oncommand chrome://messenger/content/FilterListDialog.xhtml:1 filtaquilla-util.js:596:43 in Mails like RegEx-Filter_BeispielMailX.eml.txt

I think I have to remove the "To" condition? it's redacted ion your example mail?

RealRaven2000 commented 3 months ago

I cannot reproduce the errors even though matching doesn't work (the extracted body part which is encoded quoted-printable seems to be truncated - probably needs some processing before being parsed by the regex.) If you are using debug mode can you also activate the debug switch extensions.filtaquilla.debug.mimeBody, like this:

image

find and toggle the setting:

image

it should give us additional info during body parsing

RealRaven2000 commented 3 months ago

Part of the message was cut off - so I added code to download it completely; I also added some coded for processing the "quoted printable" format that the provider was using in you example email:

filtaquilla-4.2pre3.zip

The main problem is that it is not easily possible to access the plain text portion of the raw data (which is all I can access during filtering). Thunderbird itself uses C++ methods to do the parsing but these are not accessible to an Add-on.

TonyGravagno commented 3 months ago

v4.2pre3 still has same issue, just moved to a new line. 😁
Personally I'm happy to wait until whenever you can get to it.
Thanks for trying!

image

RealRaven2000 commented 1 month ago

just released 4.2 - Published 15/10/2024. if this one still persists, we will hopefully come up with a fix in 4.2.1

V-H commented 4 weeks ago

Updated to 4.2 under TB 115.16.0esr, but the problem still persist, same behaviour: RegEx-Filter doesn’t work and on running the filters TB locked up for about one Minute. Going back to 4.0.

Galantha commented 1 week ago

TB 115.12.0, FiltaQuilla 4.1

Processing this Filter: e-Mail (fuke-VH)_2024-07-19_15-14.json brings this Error: FiltaQuilla 10:23:7.273 [1359286 ms] filtaquilla-util.js:222:13 Uncaught InternalError: allocation size overflow bodyMimeMatch chrome://filtaquilla/content/filtaquilla-util.js:596 match chrome://filtaquilla/content/filtaquilla.js:1688 runSelectedFilters chrome://messenger/content/FilterListDialog.js:758 oncommand chrome://messenger/content/FilterListDialog.xhtml:1 filtaquilla-util.js:596:43 in Mails like RegEx-Filter_BeispielMailX.eml.txt

I think this is caused by this: https://github.com/RealRaven2000/FiltaQuilla/blob/a3554926c0f23908f2e17f67fec8135e4209b0ec/content/filtaquilla-util.js#L624

RealRaven2000 commented 1 week ago

I think that code snippet comes more or less directly from this MDN article, been a while since writing that:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

the function was designed to count the matched instances, I guess exec starts searching behind the last occurence. I think this only works if the global flag is set. So we need to check for the global flag first:

      if (reg.global) {
        while ((results= reg.exec(msgBody)) !== null) {
          txtResults += `Match[${count}]: ${results[0]}\n`;
          count++;
        }
      }

will have to test the code first and then commit a patch.

Galantha commented 1 week ago

In the HTML version of the function, it was triggering overflow crashing.

RealRaven2000 commented 1 week ago

For Thunderbird 128, I added my own Mime Parsing routine, still work in progress. You can examine the retrieved parts in JS error console with the debug switch extensions.filtaquilla.debug.mimeBody = true

At the moment I am filtering out parts with contentType "attachment" and all "image/*" and "text/vcard". I am also considering removing "multipart/related", I don't think there is anything of value for the regex search there.

filtaquilla-4.2.1pre33.zip


To try out this version, download the zip file and then drag it into Thunderbird Add-ons Manager (without extracting)

RealRaven2000 commented 1 week ago

Improved boundary detection between the parts, and also splittin for OpenPGP/MIME messages:

filtaquilla-4.2.1pre39.zip


To try out this version, download the zip file and then drag it into Thunderbird Add-ons Manager (without extracting)

RealRaven2000 commented 1 week ago

Finally got a hold of John on the Add-on developer meeting, I scrapped my own mime parser and crafted a mime emitter that works with the built in mime parser in Tb128. Hopefully also backwards compatible with 115 (must test it). This version completely omits attachment parts and parses both text/plain and text/html parts, without any special processing. More features to come, see #313. This one should at least address the performance / hanging problems for now:

filtaquilla-4.2.1pre44.zip


To try out this version, download the zip file and then drag it into Thunderbird Add-ons Manager (without extracting)

V-H commented 1 week ago

I have tested 4.2.1pre44 and it is now running as usual.

However, the filter does not work. I went back to 4.0 and it works without any problems.

At the moment I have no free time for debugging...

RealRaven2000 commented 1 week ago

I have tested 4.2.1pre44 and it is now running as usual.

However, the filter does not work. I went back to 4.0 and it works without any problems.

At the moment I have no free time for debugging...

that's ok, if you can then just post a regular expression unless you are paranoid about sharing that. (there are people who think that regular expressions are some sort of a secret recipe that could be harvested by spammers - I don't like talking with those as they tend to waste my time) If your regex contains privacy related stuff fair enough.

I can easily debug a filter and an email for you but you would have to forward me the "eml" file. it's really impossible for me to generate a ton of test data here.