Closed tripleee closed 6 years ago
For future reference, the relevant code is https://github.com/Charcoal-SE/metasmoke/blob/master/app/controllers/github_controller.rb#L65.
Other things which work poorly are \W
which is used a lot in blacklist requests now, lookaheads / lookbehinds (no support for these in MySQL) and the fact that we should probably search title and body at the same time. Should there be more than one link? Or should the search be extended to have a "blacklist request support" mode where both title and body are searched, and some regex extensions are hacked in? (We could replace \W
with [^A-Za-z0-9_]
, for example.)
I also note that Smoke Detector (somewhat whimsically IMHO) replaces a space with \s
before adding something to the blacklist. I don't think that's the way to go, really.
Could you expand on that last comment @tripleee? IIRC we had a discussion about that a long time ago. Why isn't it good to generalize whitespace matching?
I think it violates the rule of least surprise. Either we support full regex and accept exactly the original input, or we have our own formalism (escaping all dots by default would seem more useful than the space to \s
replacement, for example; but I think no replacing makes more sense).
I'm also thinking that if we remap stuff in the first place, we should remap in the opposite direction when we link to Metasmoke. For example, we should remap \W
to something like [^a-z0-9_]
in the searches we generate in PRs from users who don't have code admin privileges (... I think we call them).
Bottom line, if we do this at all, we should have a disciplined rationale rather than random organically grown syntactic sugar crust.
(Sorry for repeating myself; the mobile Github app I'm trying out is otherwise nice, but doesn't show me earlier messages in the thread when composing a new comment. Also, no editing of the comment I just submitted.)
Note that the regex substitutions which I introduced in gitmanager a while ago were to maintain compatiability with MS's basic regex capabilities, etc. The way the blacklist command works has changed a lot since then so it could do with updating.
Actually works now
For example, https://github.com/Charcoal-SE/SmokeDetector/pull/430#issuecomment-270069822 indicates a problem with word boundaries. The test should probably add something like
(^|[^A-Za-z0-9_])
before the phrase and($|[^A-Za-z0-9_])
after it.