Open notaphplover opened 1 year ago
This applies to every language implementation. But I don't see how this can be fixed without potentially breaking everything for everyone.
Other than Ecmascript non-compliance, do you have a concrete example of how this prevents you from doing something?
Hi @mpkorstanje, yeah, sometimes both root and non root capturing groups are convenient to have.
As an example, consider a list of names whose delimiter is the comma (,
) char. The last elements would be separated by using the and
keyword. A possible regex would be:
((\w+)(?:, (\w+))*(?: and (\w+))?)
Consider a root group is added due to the fact the group 0 is not passed to the transformer.
Consider this link to have a playground. Including nested capturing groups leads to a different implementation that not having them. Nested capturing groups, in this example, allow us to know the last two elements separated by the and
delimiter if present. I am aware parser implementations can be proposed to correctly parse all the elements, but they would have to do additional checks to determine wheter or not there're two last elements separated by the keyword and
. More sophisticated examples can be proposed, but I find this one simple enough to illustrate my issue.
Other than Ecmascript non-compliance, do you have a concrete example of how this prevents you from doing something?
Well, I would hazard to say all regex specs shares this basic nested capturing group feature, not only Ecmascript one.
But I don't see how this can be fixed without potentially breaking everything for everyone.
Breaking or fixing? I can't find a place in the docs in which it's said this is the expected behavior. If this is the expected behavior I would prefer to close this issue in favor of a new one with a proposal to change it, so everyone knows this is not a bug, but a feature instead
Thanks for the explanation. That is a good usecase indeed.
The fix would be a breaking change for anyone who currently has declared a regex with nested capture groups but only utilizes the top level groups programmatically. And even though not explicitly documented, it is a fairly observable part of the API for any user so we'd run into Hyrum's Law.
This would need some accommodation to allow the different cucumber implementations to introduce this as a non-breaking change. Something along the lines of Argument.getValueFromCaptureGroups
. But I haven't checked the exact usage in the Cucumber implementations.
Hey @mpkorstanje, thank you so much for giving feedback.
The fix would be a breaking change for anyone who currently has declared a regex with nested capture groups but only utilizes the top level groups programmatically. And even though not explicitly documented, it is a fairly observable part of the API for any user so we'd run into Hyrum's Law.
I understand your concerns. I cannot think in a non breaking fix simple enough that would be better than releasing a major version so, would fixing this whenever you guys consider it's time for a major version make sense?
Not so lightly. While we could make a breaking change here and increase the major version, it would also be a breaking change for the different Cucumber implementations that use this library. They're each on their own independent life cycle -- coordinating this would be rather difficult. The alternative would be maintaining multiple versions of cucumber-expressions
which isn't ideal either.
But that's going to happen with every bug / feature you have, right? Do you have any process to accomplish this? How do you deliver a new feature / fix a bug in this project?
For this specific problem, typically by introducing a second code path as a non-breaking change. And then deprecating the bugged code path, eventually removing it once all the Cucumber implementations have gone through their life cycle.
Then, what about having a RecursiveGroup extends Group
class that is able to recursive look for capture groups and adding options on the user entrypoint (I don't know which one/s would be) for either using RecursiveGroup
or Group
? This way we could fix this that way and deprecate those options in favor to always use RecursiveGroup
π What did you see?
When creating a custom param definition, the transformer funcion does not received nested capturing groups.
β What did you expect to see?
As javascript developer, I expect regex capturing groups to conform Regex Ecamscript standard and capture nested groups
π¦ Which tool/library version are you using?
https://www.npmjs.com/package/@cucumber/cucumber-expressions/v/16.1.2
π¬ How could we reproduce it?
Create a new npm package with your prefered typescript setup (the bug is also recreated with plain javascript).
In the package.json, install
@cucumber/cucumber@9.2.0
dependencyDefine a custom parameter with nested captured groups:
Then, write any step definition and feature involving this custom definition
Then, create a minimal config testing the feature processing the custom parameter and the step definitions and have a look at the console. Instead of
['foo', 'foo']
,['foo', undefined]
is printed instead.π Any additional context?
It seems the capturing groups are internally represented as a tree. Cucumber relies on
TreeRegexp
to create aGroupBuilder
for the purpose of capturing regex groups. These groups are represented as a tree and only the root ones are passed to the custom definition transformer, ignoring any children groups.Having a look at the
Argument
class:group.values
are passed to the transformer:As you can see granchildren groups are never included. I would expect nested groups to be included as Ecmascript Regex spec states
This text was originally generated from a template, then edited by hand. You can modify the template here.