apache / incubator-ponymail-foal

Apache Pony Mail Foal (Next Generation Suite)
https://ponymail.apache.org
Apache License 2.0
28 stars 14 forks source link

Faster code to clean up quotes #243

Closed raboof closed 1 year ago

raboof commented 2 years ago

Right now the * in the regex to clean up quotes makes it rather slow for large quoted texts. With this change it is slightly less 'precise' but much faster, which might be a reasonable trade-off in this case?

You can reproduce at https://regexr.com/ by adding 50 or so newlines to the example input.

The effect in Pony Mail is that such emails cannot be opened at all, but produce some javascript error.

Humbedooh commented 2 years ago

Hi Arnout, First off, thank you for this patch - volunteers are the life blood of this project.

Could you provide a link to an example of this "bad behavior" for verification? I'd love to take a look at the implications here before I evaluate the change properly.

raboof commented 2 years ago

There is an email that I think you should have access to after logging in at https://lists.apache.org/thread/qghm1qkqpp1t29bzqbf37qh0rvfhrtol (the top one, from sept 30th), that triggers what I'm fairly confident is the same issue.

Humbedooh commented 1 year ago

Sorry for the long wait here! I've looked through the code, and I think - in order to actually achieve what we meant to do - we might wish to change it to:

quote = quote.replace(/\n>[>\s]*$/g, "\n");

In layman's terms:

The difference in my example is the insistence on the first > encountered being at the beginning of a line, so as to not turn "" into "<foo" if it's on one of the last lines.

FWIW, I did confirm that \s is perfectly capable of covering both spaces, tabs, newlines and carriage returns, which simplifies things.

raboof commented 1 year ago

Sorry for the long wait here!

No worries :)

quote = quote.replace(/\n>[>\s]*$/g, "\n");

sounds good to me, but I was reluctant to make such a bigger change without some kind of testsuite ;). Do you want me to amend the PR?

Humbedooh commented 1 year ago

Yes, that would be perfect, thanks.

raboof commented 1 year ago

OK, changed quote = quote.replace(/[>\s\r\n]+$/g, ""); into quote = quote.replace(/\n>[>\s]*$/g, "\n"); and force-pushed (but not retested ;) )

sebbASF commented 1 year ago

Note that this also matches '> > ' etc. This may not be a problem, but should be documented.

raboof commented 1 year ago

It indeed does, and it can't hurt documenting, but that is the case both before and after the change in this PR, right?

sebbASF commented 1 year ago

AFAICT previously it only matched trailing spaces, but it now also matches embedded spaces.

raboof commented 1 year ago

Ah, you're right of course (as long as they're at the end of the string). I think that's desired/improved behavior, but indeed different. Do you think we should document it in the code or would this PR thread be sufficient? :)

sebbASF commented 1 year ago

Code and/or program docn is where such things should be documented.

Open source releases do not contain copies of PRs.

Humbedooh commented 1 year ago

Might be high time for a CHANGELOG file