Open Galantha opened 1 week ago
That's a big change set, too much to put into a single merge. I suggest I cherry pick the smaller changes (such as missing semicolons) manually and merge them into the current branch, then we can do a fresh compare. Can you explain the differences of the regex flavors you added?
One thing I find problematic at the moment is that for every flavor we are adding new menu items to the search dropdown, it gets very unwieldy, I would rather have a single regex body command and then add the flavors as options somehow - it may mean to modify the (as always single) regex parameter to transport additional switches. Maybe something like this:
body regex - matches - /regular expression/(options)
Examples:
body regex - matches - /regular expression/(html)
body regex - matches - /regular expression/(lines)
I could potentially imagine an additional settings control like this, to add these options and build a regex more easily:
Also I believe the new regex match function are all very similar, I will put the common code into a merged util function so we only have the mime body iterator code once in case it needs fixing later.
Did the first round of changes (mainly formatting, semicolons). Would like to deal with the recursion exception within the function that generates it. Will look at your other PR first
on this topic specifically: https://github.com/RealRaven2000/FiltaQuilla/issues/269#issuecomment-2454620541
I feel that code brings limited value to the thunder bird user. If causing error, limited reason to keep.
With the overflow error: My understanding regular expression .exec will return the same array every time. Feel free to experiment here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec
if we look at the code:
if (reg.global) {
while ((results= reg.exec(msgBody)) !== null) {
txtResults += `Match[${count}]: ${results[0]}\n`;
count++;
}
}
To me it appears nothing increments the while loop.
reg.exec is creating an array off of msgBody. Nothing alters MsgBody.
it is an assignment, results = reg.exec(msgBody), not a boolean
the test itself is did reg.exec(msgBody) assign a value that is !== to results?
{ some code happens }
in the code that happens, nothing happens that might effect what reg.exec will assign next.
infinite loop. txtResults continues to grow until overflow error occurs.
Like I said, first time I have done something like this :). Your welcome to do whatever you would like with it.
To me it appears nothing increments the while loop.
reg.exec is creating an array off of msgBody. Nothing alters MsgBody.
but the Object reg
's state changes. According to to the documentation:
JavaScript RegExp objects are stateful when they have the global or sticky flags set (e.g. /foo/g or /foo/y). They store a lastIndex from the previous match. Using this internally, exec() can be used to iterate over multiple matches in a string of text (with capture groups), as opposed to getting just the matching strings with String.prototype.match().
And:
Finding successive matches If your regular expression uses the g flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property). Note that the lastIndex property will not be reset when searching a different string, it will start its search at its existing lastIndex.
just tested it, made a new regular expression that append /gi and it lists multiple results. It also explains why it only crashes sometimes.
Regular expression flags,
I like this page for a list of javascript flags: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions
you already had case insensitive: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/ignoreCase
I added multi-line to my experimental branch: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/multiline
and I would suggest perhaps adding single line also: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/dotAll
Why did I add multi-line? I am used to working with multi-line. It greatly increases performance by limiting the search area to one line at a time. -> if this is running on the gui thread like I suspect it is, it might cause thunderbird to pause, so this may or may not be important
edit: without the single line flag, I am not sure it is possible to get past \n, but I could be confused. Notice that in the original function the \n is stripped out.
I specifically did not want to do that with the html version of the function.
I did not want to do this: https://github.com/RealRaven2000/FiltaQuilla/blob/a3554926c0f23908f2e17f67fec8135e4209b0ec/content/filtaquilla-util.js#L595
so either s flag or m flag
One thing I find problematic at the moment is that for every flavor we are adding new menu items to the search dropdown, it gets very unwieldy, I would rather have a single regex body command and then add the flavors as options somehow - it may mean to modify the (as always single) regex parameter to transport additional switches. Maybe something like this:
I agree.
I love your idea.
I have no idea how to do it.
if your doing a gear like that after the expression, perhaps let the user pick from the following flags: i, m, s, ( u or v but not both, or perhaps just u ) depending on how difficult it is to do that.
Also I believe the new regex match function are all very similar, I will put the common code into a merged util function so we only have the mime body iterator code once in case it needs fixing later.
Feel free to do as you feel best :)
Regular expression flags,
I like this page for a list of javascript flags: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions
you already had case insensitive: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/ignoreCase
According to:
this should be already supported - you can add flags in the regex in the filter editor by using the / syntax, so there is no need for a separate function for multiline:
/myexpression/gm
... I just tested it:
Do we need to do anything with the body, e.g. put back line breaks in this case? Haven't really looked at that code in a while (the original is from Kent, I only added the BodyPart stuff)
lots of text deleted
sorry, it picked up a post I was writing I did not mean to post, sorry about that!
Regular expression flags, I like this page for a list of javascript flags: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions you already had case insensitive: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/ignoreCase
According to:
this should be already supported - you can add flags in the regex in the filter editor by using the / syntax, so there is no need for a separate function for multiline:
/myexpression/gm
... I just tested it:
Do we need to do anything with the body, e.g. put back line breaks in this case? Haven't really looked at that code in a while (the original is from Kent, I only added the BodyPart stuff)
I tried this, and was not able to get it to work. One moment.
so, in the filtaquilta extension, it does not work for me. There is some code that looks like it checks for that: https://github.com/Galantha/GalExpermentFiltaQuilla/blob/f3b76cc6854d59a4e37173214ad456d46f17a627/content/filtaquilla.js#L2500
But I tested it, and it did not work for me.
I was specifically having issues with this, so I added ${searchValue} and ${searchFlags} logging.
I did not want to do the extra work of adding the multi-line flag check box.
the flags I put into filtaquilta were not being carried through correctly.
I also tried (?:flag) format with no luck either if I remember right.
https://github.com/RealRaven2000/FiltaQuilla/pull/312#issuecomment-2455194599
sorry about this post, I did not realize the top of the post was somehow a half baked post I had previously started writing. I deleted that part.
I had only intended to quote you and post the bottom line.
Not sure what is happening, did you try the version I posted over at https://github.com/RealRaven2000/FiltaQuilla/issues/302#issuecomment-2455015949
For testing the "reflow" code which removes reflowing line breaks, I am currently trying to force isQuotedPrintable()
to return true
but it's almost impossible with Thunderbird composer to force it to encode stuff in ASCII, instead it always converts anything I insert to UTF-8, even if I try to insert as HTML source. I tried some newsgroup postings from eternal september but they also appear to use UTF-8.
So I guess removing line breaks is more of an exception, lots of real line breaks will be always in the body. These are removed in html messages in line 606:
if (p.includes("<html")) {
// remove html the dirty way
p = p.replace(/(<style[\w\W]+style>)/g, '').replace(/<[^>]+>/g, '').replace(/(\r\n|\r|\n){2,}/g,"").replace(/(\t){2,}/g,"");
}
so maybe the /m flag isn't really needed at least with HTML parts. I am trying to make this a little more resilient for mails written with other mail clients right now...
For testing the "reflow" code which removes reflowing line breaks, I am currently trying to force
isQuotedPrintable()
to returntrue
but it's almost impossible with Thunderbird composer to force it to encode stuff in ASCII, instead it always converts anything I insert to UTF-8, even if I try to insert as HTML source. I tried some newsgroup postings from eternal september but they also appear to use UTF-8.So I guess removing line breaks is more of an exception, lots of real line breaks will be always in the body. These are removed in html messages in line 606:
if (p.includes("<html")) { // remove html the dirty way p = p.replace(/(<style[\w\W]+style>)/g, '').replace(/<[^>]+>/g, '').replace(/(\r\n|\r|\n){2,}/g,"").replace(/(\t){2,}/g,""); }
so maybe the /m flag isn't really needed at least with HTML parts. I am trying to make this a little more resilient for mails written with other mail clients right now...
instead of doing that nastiness, I feel a better solution would be to pass the /s flag: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/dotAll perhaps even silently, and document it is being done somewhere.
I feel this is a better, simpler, and less likely to cause issues then mangling the input.
Not sure what is happening, did you try the version I posted over at #302 (comment)
I have not. It may be several days before I load the test platform, re-attach the debugger, and look at this project closely again.
You may want to consider silently passing the /s flag to every regular expression in the extension.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/dotAll
Without it, my understanding is regular expressions are not able to search past line breaks. This might leave people a bit confused as to why their expressions are mysteriously failing to detect properly.
For testing the "reflow" code which removes reflowing line breaks, I am currently trying to force
isQuotedPrintable()
to returntrue
but it's almost impossible with Thunderbird composer to force it to encode stuff in ASCII, instead it always converts anything I insert to UTF-8, even if I try to insert as HTML source. I tried some newsgroup postings from eternal september but they also appear to use UTF-8.So I guess removing line breaks is more of an exception, lots of real line breaks will be always in the body. These are removed in html messages in line 606:
if (p.includes("<html")) { // remove html the dirty way p = p.replace(/(<style[\w\W]+style>)/g, '').replace(/<[^>]+>/g, '').replace(/(\r\n|\r|\n){2,}/g,"").replace(/(\t){2,}/g,""); }
so maybe the /m flag isn't really needed at least with HTML parts. I am trying to make this a little more resilient for mails written with other mail clients right now...
That is pretty brutal with the composer.
Might resort to sadness to achieve ASCII. https://stackoverflow.com/questions/94037/convert-character-to-ascii-code-in-javascript
let sadString = String.fromCharCode(115) + String.fromCharCode(97) + String.fromCharCode(100);
or maybe? (new RegExp(/quoted-printable/)).test(contentType)
works for me in the tester:
let contentType = "quoted-printable";
console.log((new RegExp(/quoted-printable/).test(contentType)));
contentType = "asdf";
console.log(new RegExp((/quoted-printable/).test(contentType)));
instead of doing that nastiness, I feel a better solution would be to pass the /s flag: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/dotAll perhaps even silently, and document it is being done somewhere.
I feel this is a better, simpler, and less likely to cause issues then mangling the input.
the input we get from Thunderbird 128 is super mangled anyway I am looking into breaking it up into mime parts. in 115 we had MimeParser.extractMimeMsg(..)
but they took it away and give only a nasty raw message which I need to reparse and rebuild.
Just a heads up, there is a new version at:
https://github.com/RealRaven2000/FiltaQuilla/issues/313#issuecomment-2468935487
ready to test. you can add multiline switches and force the regex to only read the html part if necessary:
use the wheel to open the panel, enter desired settings and click accept.
Thank you!
I will get to this when I can!
which will hopefully be later today, but I am a bit overloaded a the moment, sorry!
Galantha: ESR225+GalExp-AddHTMLRegExpBodyFilter
add body regex html filter add regex multiline flag bug fixes in body regex filter altered logging body regex filter syntax fixing all over the place mainly semicolons - the cardinal sin, fixing that which is not broken add gui check box for new filter add gui check box for multi line flag wrapped two of four of the .exec in try { } catch blocks
notes?: unable to get any gui settings to persist?-> might have something to do with temporary addon loading?
thoughts: first time I have done anything like this
Let me know your thoughts?