Charcoal-SE / SmokeDetector

Headless chatbot that detects spam and posts links to it in chatrooms for quick deletion.
https://metasmoke.erwaysoftware.com
Apache License 2.0
474 stars 182 forks source link

stop watching "Delete this post" if followed by (if|when) #13100

Closed micsthepick closed 1 month ago

micsthepick commented 1 month ago

Is your feature request related to a problem? Please describe.

I was reviewing the puzzling related queue (filter by site) and encountered a post to the sum of "I'll delete this post when I have better permissions to post a comment" I figure this is fairly common.

Describe the solution you'd like

change the filter from:

delete[\W_]*+this[\W_]*+post(?!(?:[^<]|<(?!\/?code>))*+<\/code>)

to:

delete[\W_]*+this[\W_]*+post(?![\W_]*+if|[\W_]*+when|(?:[^<]|<(?!\/?code>))*+<\/code>)

Describe alternatives you've considered

we could remove the watch entirely, as it seems to be around 40% accurate, but that's probably why it's a PB, rather than anything else. I'm open to comments on other modifications

Additional context

original pattern: TP 63 FP 91 (time Optimized: 26.5s; non-optimized: 58.0s) 'improved' pattern: TP 63 FP 82 (time Optimized: 20.4 to 47.4 s; non-optimized 62.8 s it seems to not have a huge impact on parsing time, and cuts down the total FPs in search by almost 10%.

codygray commented 1 month ago

I can confirm the numbers in the original request; this does, indeed, reduce the number of FPs without affecting the number of TPs.


Original:

delete[\W_]*+this[\W_]*+post(?!(?:[^<]|<(?!\/?code>))*+<\/code>)

MS search%5B%5CW%5D*%2Bthis%5B%5CW%5D%2Bpost(%3F!(%3F%3A%5B%5E%3C%5D%7C%3C(%3F!%5C%2F%3Fcode%3E))%2B%3C%5C%2Fcode%3E)(%3F%3A%5Cb%7C%24)&body_isregex=1&body=(%3Fs)delete(%3F%3C%3D(%3F%3A%5E%7C%5Cb)delete)%5B%5CW%5D%2Bthis%5B%5CW_%5D%2Bpost(%3F!(%3F%3A%5B%5E%3C%5D%7C%3C(%3F!%5C%2F%3Fcode%3E))%2B%3C%5C%2Fcode%3E)(%3F%3A%5Cb%7C%24)&username_isregex=1&username=(%3Fs)delete(%3F%3C%3D(%3F%3A%5E%7C%5Cb)delete)%5B%5CW%5D%2Bthis%5B%5CW_%5D%2Bpost(%3F!(%3F%3A%5B%5E%3C%5D%7C%3C(%3F!%5C%2F%3Fcode%3E))%2B%3C%5C%2Fcode%3E)(%3F%3A%5Cb%7C%24)&or_search=1)

76 TP, 92 FP, 26 NAA (at the time of this writing)


New:

delete[\W_]*+this[\W_]*+post(?![\W_]*+if|[\W_]*+when|(?:[^<]|<(?!\/?code>))*+<\/code>)

MS search%5B%5CW%5D*%2Bthis%5B%5CW%5D%2Bpost(%3F!%5B%5CW_%5D%2Bif%7C%5B%5CW_%5D%2Bwhen%7C(%3F%3A%5B%5E%3C%5D%7C%3C(%3F!%5C%2F%3Fcode%3E))%2B%3C%5C%2Fcode%3E)(%3F%3A%5Cb%7C%24)&body_isregex=1&body=(%3Fs)delete(%3F%3C%3D(%3F%3A%5E%7C%5Cb)delete)%5B%5CW%5D%2Bthis%5B%5CW_%5D%2Bpost(%3F!%5B%5CW%5D*%2Bif%7C%5B%5CW%5D%2Bwhen%7C(%3F%3A%5B%5E%3C%5D%7C%3C(%3F!%5C%2F%3Fcode%3E))%2B%3C%5C%2Fcode%3E)(%3F%3A%5Cb%7C%24)&username_isregex=1&username=(%3Fs)delete(%3F%3C%3D(%3F%3A%5E%7C%5Cb)delete)%5B%5CW%5D%2Bthis%5B%5CW_%5D%2Bpost(%3F!%5B%5CW%5D*%2Bif%7C%5B%5CW%5D%2Bwhen%7C(%3F%3A%5B%5E%3C%5D%7C%3C(%3F!%5C%2F%3Fcode%3E))%2B%3C%5C%2Fcode%3E)(%3F%3A%5Cb%7C%24)&or_search=1)

76 TP, 83 FP, 24 NAA (at the time of this writing)


Seems like an obviously good change to make. I've gone ahead and committed it.

But, cc @ThatRyanPerson, since the original watch was yours, in case you have any comments/objections.

micsthepick commented 1 month ago

just FYI: delete[\W_]*+this[\W_]*+post(?![\W_]*+if|[\W_]*+when|[\W_]*+after|(?:[^<]|<(?!\/?code>))*+<\/code>) 75 TP, 78 FP, 23 NAA (also at time of writing)