PATCH available: ignore list regexp/glob match body, nick, user, host + ctcp version response censing - enhancement

MCMrARM / revolution-irc

A modern Android IRC client. #revolutionirc on Libera.chat

GNU General Public License v3.0

553 stars 63 forks source link

PATCH available: ignore list regexp/glob match body, nick, user, host + ctcp version response censing - enhancement #295

Open pickeduname opened 3 years ago

pickeduname commented 3 years ago

UPDATED -- this patch is retired due to some fuzz, see clean patch -v4 lower in this thread.

kurahaupo commented 3 years ago

@pickeduname I've created pull request #296 on your behalf.

The commit uses email address reply+AABQFBOB7ABFJCZ2ZEYB4VF6R7QF3EVBNHHDHKJGJI@reply.github.com which is the reply-to address for this issue, however it would be preferable to use a real email address instead - feel free to drop me a line and I'll amend it.

kurahaupo commented 3 years ago

@pickeduname I can't memo or privmsg you on freenode as your nicks aren't registered and you seem to be offline.

There's a #revolutionirc channel on freenode if you're not already using it.

kurahaupo commented 3 years ago

@pickeduname I've redone the patch to omit some obviously inadvertent changes

in app/build.gradle
- changed quote marks on line 11 (versionName)
in app/src/main/res/layout/activity_edit_ignore_entry.xml
- swap lines 73,74
- swap lines 76,77 (but keeping the /> closing tag at the appropriate place)
- swap lines 132,133

(I've had to force-push several changes because some of the XML files are invalid POSIX text files - they're missing the newline at the end of the last line - but they're also invalid Windows text files - having bare LF instead of CR+NL. My editor therefore saves the file in POSIX format with the terminal newline, but git then counts as another changed line, which I then had to patch to undo. I'm tempted to submit a separate patch to fix all the XML files to proper POSIX text format.)

pickeduname commented 3 years ago

I get this message from AS when the project is rebuilt. The APK builds fine though, unsigned in debug mode, of course:

Unsupported Modules Detected: Compilation is not supported for following modules: app, build, revolution-irc. Unfortunately you can't have non-Gradle Java modules and Android-Gradle modules in one project.

I don't know what causes it but I think you might be able to edit the gradle build file and fix it. If you find the problem, do tell what it was, please. Note: My AS version is at 4.0.x current is 4.1.x -- this may be related, or not.

kurahaupo commented 3 years ago

@pickeduname

I will amend the patch to conform to the coding style for this project, in particular:

include a space after a keyword that requires a following open parenthesis: catch (, for (, if ( & while (;
in md files, write code options as separate terms rather than all in one string
strip trailing whitespace from all lines.

I suggest that I also repeat my previous amendment to make nullRegexes private, since it's only called from the neighbouring member function. Alternatively, it could eliminated by structuring the code thus:

try {
    nickRegex = nick != null ? SimpleWildcardPattern.rCopy(nick) : null;
    userRegex = user != null ? SimpleWildcardPattern.rCopy(user) : null;
    hostRegex = host != null ? SimpleWildcardPattern.rCopy(host) : null;
    mesgRegex = mesg != null ? SimpleWildcardPattern.rCopy(mesg) : null;
} catch (PatternSyntaxException e) {
    nickRegex = userRegex = hostRegex = mesgRegex = null;
    throw e;
}

Do you have a preference?

Stepping back a bit, I am a bit concerned that anyone who's already using glob's will find that they're broken by this change, since regexes have a different syntax.

Checking the additions to README.md, I would classify "high power consumption" as a known issue: it could be mitigated by being smarter about compiling many matching rules into a single regex. Poorly written regexes are likely to be a far worse power drain. Having multiple match rules simply scales with the number of rules, but (for example) a regex with nested repeat groups that are poorly bounded, such as (.*t.*)*h, will exhibit quadratic (or worse) time over message length.

This begs the question of whether we should stick with globs, which have a more restrictive syntax that can guarantee to run in linear time.

kurahaupo commented 3 years ago

I remain concerned about the impact on existing users of changing from Glob to Regex; perhaps that could be mitigated by allowing both:

regex when enclosed in a pair of / or ?;
glob otherwise

I'm working up a patch that moves rCopy to a private static function that sits beside updateRegexes and makes the decision as above.

I'm also separating out the states for "not yet compiled", "compiled", and "compilation failed", away from whether each of the strings and/or regexes is null. This will stop it trying (and failing) to compile the Regexes on every message.

When a rule is created or deserialized, it starts in the 'not yet compiled' state. Calling updateRegexes moves it to either 'compiled' or 'compilation failed'. When saving from the UI it goes back to the 'not yet compiled' state.

It's a bit clunky because all of the members are public, so you have to remember to reset to "not compiled" when you change any of the strings, but otherwise I think the design is sound.

I don't have a .pro file, merely eyeballing the code and using grep to confirm my suspicions.

kurahaupo commented 3 years ago

I'm afraid my frustrations are creeping in and I'm refactoring as I go, rather than just making minimal changes.

I'm not qualified to comment on the Unsupported Modules Detected problem. (Remember also that I'm not an admin on this project.)

I need to sleep before committing more code, and right now my Java compiler isn't working at all, so will push my changes after my machine has done an update and I've tested my changes in the morning.

Really regex compilation should be done immediately after after loading and any changes, and not any other time. Then it's just a matter of tracking whether the compilation previously succeeded. I'm almost there with my current patch so I'll try and include that too, as it will significantly simplify the logic. Maybe I should just make the strings & regexes private, and create getters and setters to access them? Then we could be certain that the compilation is done at the right time. (Or, just have string setters that also compile the regexes, but leave them public so that they can be read easily.)

pickeduname commented 3 years ago

@kurahaupo I agree to the idea of permitting "marked" regexps using // or ?? bracketing. I also thought about this being the best way, instead of a global switch, but you seem to be scrumming ahead :) The global re/glob switch would also have brought in the problem of what to do with existing rules if the user flicks the switch. All rules deleted? All invalidated? So, there will be no global switch.

More to the point: I propose /re/ for re's, ?glob? or glob (either '?'s or no markers).
// represents the empty string match, and ?? the same using glob.

In words:
.if. pattern[0] is '/' .and. pattern[end] is '/' .then. pattern is regexp
.elseif. pattern[0] is '?' .and. pattern[end] is '?' .then. pattern is glob
.else. pattern is glob .endif.

Re: power drain and failed compilationon re's: Only re's can fail in compilation, it is not possible to create a bad re using a glob pattern. The current way to raise a toast is adequate, deleting the user's tedious entry on error is not okay. There is no point in opening some special ui screen to handle error edits. What could be done, is to set a dirty flag at runtime which causes detected-bad re's on a 1st compilation attempt at runtime to be ignored thereafter and not recompiled again and again on every message. Also, the toast could be raised once (tbd) when loading settings from persistent storage and compiling the re's for the 1st time. I will do this later when implementing the above logic with ?? //. There is one problem left: old saved settings with pattern /pattern/ will be wrongly considered re's and compiled, which may fail, or not (!). The latter case merits an entry in README.md to warn users.

Edit: all the above are implemented in patch -v4 , less the warning in the README.md

kurahaupo commented 3 years ago

I really need to sleep - I'll reply in detail in about 12 hours

kurahaupo commented 3 years ago

I've just had a night's sleep and almost finished rebuilding my machine so that I have a working JDK. I'll push my changes once I've confirmed an error-free build.

These were comments that I started to write yesterday but left off when I needed to sleep. I haven't had a chance to review your latest patch yet, so please don't read them as a critique of that. However from your description it sounds like both of our patches have headed in similar directions. (I had hoped to avoid duplication of effort but c'est la vie.)

In particular I think that regex compilation should be done immediately after setting the corresponding string; that is, upon initialization, after after de-serializing, and after any changes, and not any other time. Then it's just a matter of tracking whether the compilation previously succeeded.

Our one point of difference is this: I'm not really happy about using ?...? to denote globs. I suggested it for an alternative for regex precisely because:

An alternative is helpful to make things clearer when you want to include a / in the pattern; and
Question marks in particular are commonly used elsewhere for this purpose: vim, sed, perl, and other tools will recognize ?...? as a regex in the same way that they recognize /.../, except that you can then put / inside the regex. I would not want to deviate from users' expectations on the second point.

Some alternatives come to mind:

Use another delimiter for globs, such as $...$
Use a suffix after the closing /, so that
- /.../r would be a regex and
- /.../g would be a glob.
Have a scheme prefix, like in URLs:
- glob:pattern
- regex:pattern
Prefix a regex with something that logically should never occur at the start of a glob, such as **
Heureustics:
- Things that cannot occur at the start of a regex, such as * or ?, should be taken as meaning that it's a glob.
- Things that don't normally appear in a glob, such as starting with ^ or ending with $, should be taken as meaning that it's a regex.

Allowing a delimiter in both cases allows one to write patterns that start or end with whitespace, while still allowing the string to be "trimmed" upon input.

Having said all that, I don't really see the point of making the glob case any more complicated. If the user really needs a glob pattern that starts and ends with / or ?, perhaps they should just write a regex instead?

kurahaupo commented 3 years ago

Footnote: When I'm editing code I tend to refactor fairly strongly, rather than just making minimal changes (Seeing a long chain of member selectors and array-indexings gratuitously repeated just makes me see red: give that sucker a name once and then use the name. Otherwise it's really slow and tedious to read and check that it really refers to the same thing, and that there isn't, for example, a different array index in the middle somewhere. As a side benefit, I have no trouble keeping most of my code lines shorter that 80 chars, which also makes the code easier to read confidently.)

kurahaupo commented 3 years ago

If I've "stolen your thunder" or otherwise detracted from your efforts then I do apologize. When I said "c'est la vie", I meant that since your contributions obviously take precedence, I just have to accept that mine may be discarded. I'm quite willing to step away if that's what you'd prefer.

I started this by creating a PR based on your first patch file, with the intention that your work would be accepted even if you did not want to publicize yourself as the author.

I also did some basic code review to minimize the obstacles to the PR being accepted. Does that make it an unjustified "nitpick"? (Not for me to say, but I hope not.)

As a user of this program, I also offer some feedback on the proposal from a IU perspective.

It was not my intention that this should become a prolonged process. I don't want to deal with a stream of patch files from anyone, and I certainly do not wish to become the maintainer of any long-running fork. On the contrary, I intend to delete my fork as soon as this PR has been dealt with.

I'm about to run out of time to spend on this, and I don't want our time to have been wasted, so I will push commits for each of your patches, and each of with my commits, into separate branches in my fork, for you to inspect. I will push the commit for your v2 patch into the PR as-is (without any editing by me). I will also make you a maintainer for my fork, so you can pick and choose which other commits to rebase or force-push into the PR; if I understand correctly doing so should not compromise your identity.

I might or might not have time for further engagement after that.

PS: Whilst I do get itchy fingers to refactor existing code, I'm always careful to put that into separate commits from those concerning functionality changes.

Despite my inclination to nitpick existing code, my effort here was exactly the opposite: to make the PR as tight as possible. You noted issues with lines swapped etc, which I found and undid, and that tightened the change set, which was good.

The comments on line-endings in the XML files were in not any way intended as critiquing your "style". Rather I was trying to minimize the diff, and that entailed undoing some of my inadvertent changes. Mea culpa.

I squashed "your" commits into one in an effort to make you "look good", but was that misguided? Would you have preferred me to leave these as separate "change" and "tidy" commits on your behalf?

I apologise that I have not been able to provide a .pro file for you as I do not have one. My comments on coding style were simply based on observation of the existing code, with some statistical analysis to avoid it being merely a matter of my personal biases & preferences.

Again, I leave it over to you to decide how you'd like me to proceed, within the limits of the time I have available.

MCMrARM commented 3 years ago

Since this project uses GitHub we use pull requests since it allows a much easier way to review and comment on code, and shout out to @kurahaupo for making a pull request #296 for this issue. I have submitted a code review in that pull request; I'm not willing to talk about a patch file. It's fine for PRs to be squashed/etc., this is not an issue in general, in the end the whole PR will be squashed anyways when merging.

Toast message is okay, althrough not perfect; an error should focus on the text field probably and use the material themed error label below the field; however due to the extra complexity this requires not doing this is a reasonable compromise.

I have never seen the "Unsupported Modules Detected" error before before so I can not help with it.

Indeed there is no formatting guidelines right now, but in general try to be consistent with the formatting around. That is, stuff like something( "argument") are unacceptable. Since there have been next to no pull requests in the past, there wasn't a need to standarize the code style. I'm happy to accept a pull request for this.

I have not referred to everything posted since this is a really long thread.

I have limited amount of time to work on stuff and I no longer use Android or this application on my primary device, but I'm more than happy to review and accept pull requests for this project. The codebase is sadly legacy and has a lot of code smell, and if I was still actively using this app I'd probably start slowly rewriting it, but I do not have the time for that.

pickeduname commented 3 years ago

@MCMrARM thanks for joining the discussion. I have read your posting carefully, and we will work as you said, together with @kurahaupo, who helped me out with the git side of things, while I concentrate on functionality. I have also limited time for this project, but I consider certain features, like regexp working ignore lists and the ability to limit information sent out on CTCP VERSION responses, to be essential for online acivity with Revolution IRC. I wrote the patches primarily for myself, and am sharing them. Will continue to do so, but there will be no major rewrite from me. Noted about formatting, I will comply. I was hoping for others to try out the patch too, did not happen so far, that I know of.

The discussion will continue in this thread, but, as I said, not for long anymore. Most points were touched and implemented so far, eventually it will be wrapped up and then become a proper pull/commit to be accepted to the project or not.

Comments are welcome in continuation of this thread. Preferrably about code functionality :)

pickeduname commented 3 years ago

Technical aside:

I have traced back at least some possible unwanted XML file edits, like line swaps, to AS 's gui ui editor's selection mechanism being very twitchy imo. Selecting a ui element graphically with the mouse, to access properties, frequently does more than just selecting the element, and may change the geometry without warning of any kind.
For this reason, reading any patch / diff generated from XML files if AS was used to edit properties etc., and editing out manually whatever AS changed, is essential to obtain clean patches.

MCMrARM commented 3 years ago

Technical aside:

I have traced back at least some possible unwanted XML file edits, like line swaps, to AS 's gui ui editor's selection mechanism being very twitchy imo. Selecting a ui element graphically with the mouse, to access properties, frequently does more than just selecting the element, and may change the geometry without warning of any kind.

Usually, I use the code editor and the "GUI" only for preview when making UIs.

For this reason, reading any patch / diff generated from XML files if AS was used to edit properties etc., and editing out manually whatever AS changed, is essential to obtain clean patches.

kurahaupo cleaned the XMLs; the code quality issues are still present though and need to be resolved before a merge

pickeduname commented 3 years ago

PATCH AVAILABLE v4 cumulative patch - to numeric 18 string "0.5.4f"
Adds:

faulty regexp flagging in Ignore List using red leading \@bad: text
faulty regexp flagging on server open / connect, also on Ignore Entry save, using toast.
?glob? / /regexp/ matching defaults to glob in ignore rules, old saved rules should work.
match ignore rules against message text.
Restrict CTCP VERSION sends a fairly anonymous version response when selected.
verified in a debug build APK on at least one device (Oreo 8.1)

This is the final patch in this series, any further edits will be to move to git commits from diff and to fix bugs and beautify ?! the code.
Known almost-bugs: strings plural mechanism was not used in one location.

Apply patch with zcat $patchname | patch -p1 in top project directory. The patch was manually reviewed for noise and nuisances and is known clean.

io.mrarm.irc_12_to_18-v4.diff.gz