Look-behind issue with case-insensitive match

vpetrovykh commented 6 years ago

I've ran into a weird look-behind error in Atom 1.22.0 while trying to create a grammar file. After tinkering for a bit I've reduced the offending grammar to a fairly minimal CSON file:

name: "FooGrammar"
scopeName: "source.foo"
fileTypes: [
  "foo"
]
uuid: "708acdf0-3389-41cd-80f5-44b654eee848"
patterns: [
  {
    include: "#test"
  }
]
repository:
  test:
    begin: "(?i)(?<=aff)z"
    end: "end"
    contentName: "meta.foo"

This produces the following error:

Uncaught Error: invalid pattern in look-behind /usr/lib64/atom/app.asar/node_modules/first-mate/lib/scanner.js:31 
    at Scanner.module.exports.Scanner.createScanner (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/scanner.js:31)
    at Scanner.module.exports.Scanner.getScanner (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/scanner.js:37)
    at Scanner.module.exports.Scanner.findNextMatch (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/scanner.js:56)
    at Rule.module.exports.Rule.findNextMatch (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/rule.js:98)
    at Rule.module.exports.Rule.getNextTags (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/rule.js:154)
    at Grammar.module.exports.Grammar.tokenizeLine (/usr/lib64/atom/app.asar/node_modules/first-mate/lib/grammar.js:152)
    at TokenizedBuffer.buildTokenizedLineForRowWithText (/usr/lib64/atom/app.asar/src/tokenized-buffer.js:506)
    at TokenizedBuffer.buildTokenizedLineForRow (/usr/lib64/atom/app.asar/src/tokenized-buffer.js:501)
    at TokenizedBuffer.tokenizeNextChunk (/usr/lib64/atom/app.asar/src/tokenized-buffer.js:389)
    at _.defer (/usr/lib64/atom/app.asar/src/tokenized-buffer.js:373)
    at /usr/lib64/atom/app.asar/node_modules/underscore/underscore.js:666

As best I can tell, the issue is caused by having ff or fi appear in the look-behind, but only if it's also case-insensitive. Here are some variations that produce the same issue for me:

begin: "(?i)(?<=afi)z"

begin: "(?i)(?<=fi|wq)z"

It is possible that this is because ff and fi can both be ligatures. The error happens irrespective of whether the actual file targeted by the grammar contains the offending pattern.

Ingramz commented 6 years ago

I tried the grammar provided, but couldn't reproduce the error (Atom 1.22.1, macOS 10.13.1).

Try the following from devtools console:

onig = require('oniguruma')
new onig.OnigRegExp('(?i)(?<=afi)z').searchSync('aFiz')

Let me know if it matches correctly or returns the same error as above.

Edit: Another useful test that is more closely related to the error message source:

onig = require('oniguruma')
new onig.OnigScanner(['(?i)(?<=fi|wq)z']).findNextMatchSync('afiz', 0)

vpetrovykh commented 6 years ago

Running from devtools console:

onig = require('oniguruma')
new onig.OnigScanner(['(?i)(?<=fi|wq)z']).findNextMatchSync('afiz', 0)

produced:

VM1868:1 Uncaught Error: invalid pattern in look-behind
    at <anonymous>:1:30

Oddly enough I only get the error from Atom devtools console. I tried putting those 2 lines into a separate js file and run it with node by itself, but that didn't produce any errors.

vpetrovykh commented 6 years ago

I may have a related issue. It looks like a bunch of my look-behind rules stopped working (in Atom version of the https://github.com/MagicStack/MagicPython grammar). Do you by any chance know if there have been recent-ish (in the past few months) changes in how first-mate is using the oniguruma scanner? Specifically I'm having issues with expression like (?<!\\)\n (a newline not preceded by a "\"). In CSON language spec it's typically used like this: end: "(\\1)|((?<!\\\\)\\n)". As far as I recall this used to work in the past. I'm trying to see if this is specific to atom and first-mate or oniguruma.

Ingramz commented 6 years ago

@vpetrovykh oniguruma has not changed, however we fixed a few bugs related to newlines recently in first-mate (#100).

If you are saying that the error you are getting only occurs in Atom and not using node, then there might be something wrong with the way how Atom is packaged (assuming you are using Linux). I tried installing atom.io deb in Ubuntu VM and I couldn't reproduce the issue there either, which somewhat supports that it might be an issue with the distribution/package that you are using.

vpetrovykh commented 6 years ago

OK, I'll try testing this out in Atom on a different Linux machine than my current one and see if I get different results. This might help to narrow down the factors that affect this issue. Hopefully this'll help me to narrow down either a fix for my grammar or a better example for the issue.

atom / first-mate

Look-behind issue with case-insensitive match #105