Open kieran-ryan opened 1 month ago
@mpkorstanje, wondering you by any chance have any guidance on this one - either intermediate or as a long term solution?
In essence - at least in the Python implementation - due to an invalid regular expression check, Cucumber Expressions containing optionals are being incorrectly treated as Regular Expressions. Thus, they are being considered 'undefined'.
With https://github.com/cucumber/vscode/issues/125 in mind, this looks like a pretty tricky problem. Cucumber and regular expressions have considerable overlap. Consider:
Hello(.+)\?
Hello( world)?
Hello world?
For Java, cucumber-expressions use a simple heuristic to determine what we're dealing with. This is implemented in the ExpressionFactory
. (Note: The characters aren't anything special, they're the regex start of input and end of input markers).
I do not see a Python equivalent of the ExpressionFactory
so unfortunately users of the cucumber-expressions library have to decide what is a regex and what is a cucumber expression. And the language service would have to duplicate that logic. So I do think it would be a good idea to implement a ExpressionFactory
for Python to get rid of at-least some ambiguity by providing a canonical solution.
But that won't solve the problem. Where Cucumber JVM only supports regular and cucumber expressions, Behave currently only supports regular and parse expressions, while PyTestBDD-NG supports regular-, parse-, and cucumber-expressions, in addition to a heuristic.
So what kind of expression an expression is, depends entirely on the context. Is that something we can access from within vscode or the language service? If not, it might have to become a configuration flag.
This is great!
So what kind of expression an expression is, depends entirely on the context. Is that something we can access from within vscode or the language service? If not, it might have to become a configuration flag.
We could extract this information, though how it's configured varies quite a bit based on the framework and may be a challenge to maintain. The configuration option sounds like a great shout: minimal implementation and easily replicable across languages and frameworks.
Will look into the ExpressionFactory to gain an understanding and think about this further. Thanks a million!
👓 What did you see?
✅ What did you expect to see?
📦 Which tool/library version are you using?
🔬 How could we reproduce it?
Steps to reproduce the behavior:
Open Visual Studio Code
Install the official Cucumber extension
Create a feature file inside the
features
directory` containingCreate a step definition inside the
features/steps
directory containingObserve the step in the feature file is highlighted as 'undefined'
📚 Any additional context?
The Language Service Python implementation prioritises Regular Expressions (checks first) over Cucumber Expressions.
A criteria for determining whether a pattern is a Regular Expression is whether it contains brackets
()
throughspecialCharsMatch
.https://github.com/cucumber/language-service/blob/6a35176c10828812e6fc86b0a5ab3dc774852906/src/language/pythonLanguage.ts#L135-L142
As a result, any Cucumber Expression containing an optional will be treated as a Regular Expression and the optional will instead be considered a capture group.
A challenge is that in some languages a Regular Expression can be denoted by special prefix and suffix characters, whereas in Python, strings are similar in either case. See Java implementation:
https://github.com/cucumber/language-service/blob/6a35176c10828812e6fc86b0a5ab3dc774852906/src/language/javaLanguage.ts#L20-L24
Brackets usage in Regular Expressions with Python
Official Python documentation on regular expressions outline the use of brackets as follows:
(...)
Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use ( or ), or enclose them inside a character class: [(], [)].
(?...)
This is an extension notation (a '?' following a '(' is not meaningful otherwise). The first character after the '?' determines what the meaning and further syntax of the construct is. Extensions usually do not create a new group; (?P...) is the only exception to this rule. Following are the currently supported extensions.
Further References