babel / proposals

✍️ Tracking the status of Babel's implementation of TC39 proposals (may be out of date)
https://github.com/tc39/proposals
433 stars 39 forks source link

Regex named capture groups #35

Closed nicolo-ribaudo closed 5 years ago

nicolo-ribaudo commented 6 years ago

Named capture groups are being merged in the main spec (https://github.com/tc39/ecma262/pull/1027). We need a plugin for that :slightly_smiling_face:

There is already https://www.npmjs.com/package/babel-plugin-transform-modern-regexp, but it does a lot of things. I think we should have a plugin which only transforms named groups (like we have for every other regexp feature).

It would consist in two parts: 1) The pattern transpiler e.g. (?<name>.)\k<name> -> (.)\1 We have two options for this: regexp-tree and regexpu-core. I would prefer using regexpu-core, because:

2) The runtime wrapper This is needed to add the groups property to the result of .match/.exec/.replace. I think the implementation which gives best browser support requires overwriting RegExp#exec and String#replace. Another option which doesn't requires modifying builtins is to use a class like this:

    class BabelRegExp extends RegExp {
      exec() { /* overwrite */ }
      [Symbol.replace]() { /* overwrite */ }
    }
This would be self-contained, but it wouldn't work in IE <= 10 or other browsers without `__proto__`.

@babel/babel Thoughts? (especially about regexp-tree vs regexpu-core)

cc @mathiasbynens

Daniel15 commented 6 years ago

For what it's worth, regexp-tree is written by @DmitrySoshnikov who works at Facebook, and he's one of the smartest people I know when it comes to parsers and regular expressions. 👍

After reading the Github readmes for regexp-tree and regexpu-core, it sounds like regexp-tree could be more flexible? regexpu-core seems to be designed for a single purpose.

non standard featues (like comments in patterns)

Is that non-standard? Nearly all implementations support it. regular-expressions.info says that it's just some very old or simplified implementations that don't support it: https://www.regular-expressions.info/freespacing.html

Of the flavors discussed in this tutorial, only XML Schema and the POSIX and GNU flavors don't support it. Plain JavaScript doesn't either, but XRegExp does

I wouldn't call it "non-standard", I'd call JavaScript's regex the non-standard one. It's lacking standard features that every other language has 😛

michaelficarra commented 6 years ago

@Daniel15 For the purposes here, the only standard we care about is ECMAScript. And regexp-tree certainly goes above and beyond what that standard requires.

Daniel15 commented 6 years ago

Won't that make it easier to add new features in the future, though? It seems more future proof to use a library that supports more advanced regex features, particularly since it's likely that there'll be more requests to improve JS regex support such that it's on par with other languages. JS regexes have been lagging far behind for a very long time, so I'm happy to see these improvements :)

DmitrySoshnikov commented 6 years ago

Hey guys,

Just an FYI, if you nevertheless prefer going with the regexp-tree (which is fully based on ECMAScript) -- it is possible to apply just one this transform for named captured groups.

This is handled by the compat-transpiler module, which may have a whitelist of transforms to apply.

In particular, for the use-case:

const regexpTree = require('regexp-tree');

// Using new syntax.
const originalRe = '/(?<all>x)\\k<all>/';

// For legacy engines.
const compatTranspiledRe = regexpTree
  .compatTranspile(originalRe, ['namedCapturingGroups'])
  .toRegExp();

console.log(compatTranspiledRe); // /(x)\1/

The transform also returns the names of the captured groups, which can further be passed to the runtime module you mention. I use a similar approach with custom exec method (see this unit test with accessing groups property).

ljharb commented 6 years ago

@Daniel15 it's not more future-proof if the features that land in the spec itself end up differing; it's quite critical that the default for a feature like this be "nothing beyond what's in the spec".

DmitrySoshnikov commented 6 years ago

@ljharb, right, with the whitelist parameter I mentioned, you may granularly pick only needed things from the spec (and nothing beyond).

But it's completely up to you which tool you choose of course -- I built regexp-tree to be "based on ECMAScript" (including it's parsing grammar, etc) + cooler features if one needs them, but they are not enforced, and you may restrict purely to ES spec.

ljharb commented 6 years ago

@DmitrySoshnikov oh sure, i'm talking about for babel :-) your tool can choose whatever defaults it likes!

DmitrySoshnikov commented 6 years ago

@ljharb, exactly for Babel ;) all the features are from ECMAScript, and can granularly be picked one by one. Everything what goes on top of it (x flag, comments, multiline, etc) can be disabled.

mathiasbynens commented 6 years ago

After reading the Github readmes for regexp-tree and regexpu-core, it sounds like regexp-tree could be more flexible? regexpu-core seems to be designed for a single purpose.

IMHO it makes more sense to compare regjsparser with regexp-tree. regjsparser is the parser that regexpu uses — regexp-tree wasn’t available at the time regexpu was created.

regjsparser doesn’t yet know about named capture groups (or lookbehind assertions), so we’d have to teach it before regexpu can handle those.

nicolo-ribaudo commented 6 years ago

Thank you all for your comments.

Everything what goes on top of it (x flag, comments, multiline, etc) can be disabled.

I think I'll use regexp-tree then, since it already supports named groups. If needed, we can always switch to regexpu-core in the future.

mathiasbynens commented 6 years ago

Thanks to @nicolo-ribaudo’s work in https://github.com/mathiasbynens/regexpu-core/pull/14, regexpu-core now supports named groups as well.

brneto commented 6 years ago

Babel is going to support this feature? I have looked at babel packages and I haven't saw this plugin there.

r4j4h commented 6 years ago

@brneto Great question, I honestly don't know but am interested myself. According to https://github.com/babel/babel/pull/7105 the docs haven't been updated yet and some unit tests need to be fixed, so hopefully soon!

DmitrySoshnikov commented 6 years ago

There is a similar issue in the custom https://github.com/DmitrySoshnikov/babel-plugin-transform-modern-regexp/issues/3 repo, though as @nicolo-ribaudo mentioned there, he'll try to take the original PR in Babel. @nicolo-ribaudo, are you still on track for the PR?

damianobarbati commented 6 years ago

+1!

nicolo-ribaudo commented 5 years ago

https://github.com/babel/babel/pull/9345 :tada: