codemirror / codemirror5

In-browser code editor (version 5, legacy)
http://codemirror.net/5/
MIT License
26.77k stars 4.96k forks source link

Negative lookahead in regex doesn't work #6233

Open d8888 opened 4 years ago

d8888 commented 4 years ago

Hi,

When using "simple mode" with custom regex grammar, "negative lookahead" in regular expression seems to have no function

For example: Rule /(?<![a-zA-Z0-9])OP/, I expect the following string to match: Zerg is OP, please nerf.

The following strings should not match: NOP, Zerg is fine, learn to play

But in fact, both strings will be matched in simple mode in Codemirror, negative lookahead (?<![A-zA-Z0-9]) has no effect.

Apart from modifying the regular expression rule , I found no way to make negative lookahead work. Is there an elegant alternative or can it be corrected? Thank you

Browser: The latest version of chrome, confirmed that negative lookahead can be used Codemirror version: 5.49.2

d8888 commented 4 years ago

Found the suspected cause, which may be related to the mechanism of simple mode

addon\mode\simple.js

      for (var i = 0; i < curState.length; i++) {
        var rule = curState[i];
        var matches = (!rule.data.sol || stream.sol()) && stream.match(rule.regex);
        if (matches) {
            .... 
        }
      }
      stream.next();
      return null;

lib\codemirror.js

  StringStream.prototype.match = function (pattern, consume, caseInsensitive) {
    if (typeof pattern == "string") {
      ....
    } else {
      var match = this.string.slice(this.pos).match(pattern);
      if (match && match.index > 0) { return null }
      if (match && consume !== false) { this.pos += match[0].length; }
      return match
    }
  };

In simple mode, stream "seek forward" char by char if no match is found, stream.match can't "see" the string before stream.pos when matching against regular expression,

Take "matching /(?<![a-zA-Z0-9])OP/ against 'NNNNNNOP learn to play'" as an example, when stream.pos = 6, the string "seen" by stream.match to match against the regex will be "OP learn to play", the foremost substring "NNNNNN" become invisible to negative lookahead, resulting in a match that should not happen.

The only solution I found is to add an extra rule to filter out "false positive" before it happens, for example:

[
    {"regex": /[^\s]+OP/, "token": "normal"},
    {"regex": /OP/, "token": "the_special_keyword"},
]

Will this behavior be changed in the future? Or, will this behavior be recorded in the documentation of simple mode?

Thank you

marijnh commented 4 years ago

That's called lookbehind, and indeed, it won't work in many places in CodeMirror, because we're working around the lack of support for the sticky flag in some of the browsers that must be able to run CodeMirror by executing the regular expression on a substring of the input. But even fewer browsers actually support lookbehind right now, so it might not be practical to use it unless you're targeting only Chromium-based browsers.

This is very unlikely to get fixed in the 5.x version.