ColinEberhardt / assemblyscript-regex

A regex engine for AssemblyScript
MIT License
86 stars 12 forks source link

Assertions within alternations are not supported #40

Open Y-- opened 3 years ago

Y-- commented 3 years ago

Hello!

Here is the test to reproduce the issue:

it("non-capturing groups should not capture with expression", () => {
  const re = new RegExp("(?:^|\\s|-)\\S", "g");
  const input = "hello, great-world";

  let match = exec(re, input);
  expect(match.matches.length).toBe(1);
  expect(match.matches[0]).toBe("h");

  match = exec(re, input);
  expect(match.matches.length).toBe(1);
  expect(match.matches[0]).toBe(" g"); // this fails and return the second letter (and each consecutive exec will return all the letters)

  match = exec(re, input);
  expect(match.matches.length).toBe(1);
  expect(match.matches[0]).toBe("-w");
});

While in Javascript:

const re = new RegExp("(?:^|\\s|-)\\S", "g");
const input = "hello, great-world";

> re.exec(input);
[ 'h', index: 0, input: 'hello, great-world', groups: undefined ]
> re.exec(input);
[ ' g', index: 6, input: 'hello, great-world', groups: undefined ]
> re.exec(input);
[ '-w', index: 12, input: 'hello, great-world', groups: undefined ]
> re.exec(input);
null

Or on one line:

'hello, great-world'.replace(/(?:^|\s|-)\S/g, x => x.toUpperCase())
> 'Hello, Great-World'

Happy to give it a shot, if you have any hint where to start looking it would probably save me a lot of time :-)

Thanks!

ColinEberhardt commented 3 years ago

Thanks for raising this. The problem is that the library currently doesn't support start or end of string assertions within alternations.

see the spec test here: https://github.com/ColinEberhardt/assemblyscript-regex/blob/main/assembly/__spec_tests__/generated.spec.ts#L2205

The current implementation assumes that these occur at the start or end of the regex string: https://github.com/ColinEberhardt/assemblyscript-regex/blob/main/assembly/regexp.ts#L126-L132

Happy to give it a shot, if you have any hint where to start looking it would probably save me a lot of time :-)

That would be great! The assertion logic is applied in this function: https://github.com/ColinEberhardt/assemblyscript-regex/blob/main/assembly/regexp.ts#L153-L222

As an aside, it would probably be useful for this library to throw an error if you use a regex that has features that are not supported yet.

Y-- commented 3 years ago

Awesome, thanks a lot for this feedbacks! I'll try to give it a shot ASAP, not sure exactly when though (hopefully in the next few months). And definitely +1 to throwing if not supported :-)