acuminous / yadda

A BDD javascript library
412 stars 73 forks source link

Incorrect step implementation is selected #243

Closed lolmaus closed 2 years ago

lolmaus commented 6 years ago

Hi!

I have these two step implementations:

library
  .given(                                            "the following items:\n$table", function () {})
  .given("an (open|closed) box(?: with id (.+?))? and the following items:\n$table", function () {})

The problem is that this step:

    And an open box with id 1 and the following items:
      ------------------------------------------------------------------------------------------------------------------------------
      # unrelated
      ------------------------------------------------------------------------------------------------------------------------------

the first implementation is chosen!

I've seen this problem multiple times: when a step regex contains a fragment identical to another step regex, then the latter may be matched.

I've debugged it a little bit and found that the actual regex used to match the "the following items:\n$table" step under the hood is:

/(?:[Gg]iven|[Ww]ith|[Aa]nd|[Bb]ut|[Ee]xcept)\s+the following CRs:\n([^\u0000]*)/

The unfortunate coincidence is that and is both a valid Cucumber prefix and a way to compose conditions in the middle of a step.

Here's a generalized version of conflicting steps:

library
  .given("FOO")
  .given("blablabla and FOO blablabla")
  .given("blablabla with FOO blablabla")
  // etc

This can be easily solved by introducing ^ and $ string terminators to the regex.

cressie176 commented 6 years ago

Hi @lolmaus, I'll take a look. It probably won't be for a couple of days though.

lolmaus commented 6 years ago

I've worked around the issue with this monkey patch:

const dictionary = new yadda.Dictionary();
const oldExpand = dictionary.expand;
dictionary.expand = function () {
  const result = oldExpand.apply(this, arguments);
  result.pattern = `^${result.pattern}$`;
  return result;
};

@cressie176 Do you want me to submit a PR with this fix?

cressie176 commented 6 years ago

Thanks @lolmaus. Thanks for posting the code and an offer of a PR. I want to take time to explore a few more options as I adding the ^ and $ would be a breaking change. There might also be a few complications with indentation around multiline steps.

It doesn't mean I won't make it though, I think it's reasonable, but that's what makes me suspicious is that I haven't hit this before.

I also want to check why the longest common sequence algorithm isn't selecting the closest step.

lolmaus commented 6 years ago

I adding the ^ and $ would be a breaking change.

I didn't notice any difference in the behavior in my app. Though I didn't try running Yadda's test suite.

There might also be a few complications with indentation around multiline steps.

Multiline steps are just regular steps with an \n in them. Also, Yadda does not use the multiline flag.

lolmaus commented 6 years ago

I also want to check why the longest common sequence algorithm isn't selecting the closest step.

So are partial matches intentional (regardless of conflicts)?

I would never expect a step implementation like this:

.given("a patient with anxiety and a post-traumatic stress disorder", function () {})

to be selected for this step:

Given a post

but the under-the-hood regex for the latter would be:

/(?:[Gg]iven|[Ww]ith|[Aa]nd|[Bb]ut|[Ee]xcept)\s+a post/

And it does match:

regex101.com demo
cressie176 commented 6 years ago

I didn't notice any difference in the behavior in my app. Though I didn't try running Yadda's test suite.

It's not just the Yadda test suite I have to be concerned about, but every test suite ever written by users. This doesn't mean that I won't make breaking changes, especially when they are required to fix bugs, it just means that I'm careful about making them.

Prefixing ^ to the regex would be a breaking change as it could prevent indented steps from matching. This is probably fixable by using ^\s*(?:[Gg]iven|[Ww]ith|[Aa]nd|[Bb]ut|[Ee]xcept)\s*.

Appending $ to the regex would also be a breaking change. It would be quite legitimate to do

.given("anything starting with blah")
Given anything starting with blah blah blah

So this one I probably won't fix. If you append $ to the step definition it's honoured so there's an easy workaround if necessary.

So are partial matches intentional (regardless of conflicts)?

I agree that

.given("a post", function () {})

should not match

Given a patient with anxiety and a post-traumatic stress disorder

This is a bug, which should be fixed.

What I was curious about is why in your first example

.given("the following items:\n$table", function () {})

was selected when

given("an (open|closed) box(?: with id (.+?))? and the following items:\n$table", function () {})

is a better match

It's true that the first step should not have matched, but does because of the lack of ^. But even when both steps match, yadda's step selection code should have preferred the second step, because when the groups were removed it's a closer match (using Levenstein's distance) than the first step.

I was curious whether there was another bug lurking around, so wanted to investigate a bit more. Creating a test case from your example did match both steps, but the "better" one was selected. Did you simplify / change your first example perhaps?

lolmaus commented 6 years ago

Did you simplify / change your first example perhaps?

Yes, sir. :disappointed_relieved:

After I posted the issue, I've found out that there was a typo in the step implementation that I expected to fire.

But in that situation I expected to see an Undefined step message. Instead, a different step fired without any warning -- and produced scary errors from the DB mock layer. It was a Given step that required certain other steps. Also, the arguments were wrong. I spent a good hour pulling my hair, trying to figure out why my DB seeding setup went haywire -- before I realized it was not a seeding issue at all.

But this was not the only time when I ran into the issue of a different step implementation being fired unexpectedly. I had a couple of those earlier.

Frankly, I don't understand why would anyone want step matching to be heuristic. When I need phrasing to be flexible, I implement it explicitly with regular expressions.

Different step implementations have different intentions and different arguments. I can't imagine a situation where it would be OK to have step A trigger an implementation intended for step B.

cressie176 commented 6 years ago

Should be fixed in 2.0.0. Thanks for reporting.

cressie176 commented 6 years ago

Frankly, I don't understand why would anyone want step matching to be heuristic. When I need phrasing to be flexible, I implement it explicitly with regular expressions.

One of the challenges of maintaining a large number of specifications, is how to avoid step clashes, without compromising the language used to write the steps. Yadda tries to help in three ways

  1. Dictionaries
  2. Runtime library selection
  3. Heuristics

The heuristics are fairly simple. In the event that multiple steps match Yadda will select the closest after removing any regular expressions. If there are still multiple matches, Yadda will select the step from the most recent step library. The former method was important before I added dictionaries, since regular expressions in step text reduce readability. I find the latter useful for functional web testing, where I might have the same feature implemented in different ways, e.g.

When I navigate to the search page
And search for marbles
When I navigate to the home page
And search for marbles

By implementing the search and home page steps in separate libraries in two libraries, I can repeat the step text, but have two quite different implementations (quick search vs advanced search). Yadda will still make the desired step selection.

I agree that heuristics have the potential for unexpected step choices. In practice I've found this to be rare, quick to debug and outweighed by the benefit of clearer features files. This has always been one of Yadda's prime directives and the reason I don't enforce Given/When/Then notation.

lolmaus commented 6 years ago

@cressie176 Thank you for the fix. :bow:

Can you please elaborate how heuristics work?

In the above example, do you mean that you have two distinct implementations for .when("search for marbles"?

Also, are both libraries loaded at the same time? It kinda makes no sense to use both at once. But if you used only one library per test, no heuristics would be necessary.

cressie176 commented 6 years ago

You have two distinct implementations for .when("search for marbles"?

This is possible providing they are in separate libraries. Adding identical steps to the same library results in an error.

Are both libraries loaded at the same time

It's up to the implementer, but say yes for the purposes of the following example

Imagine you have a website with multiple pages. The website header contains things like "login" and "search" functions. Because they are used everywhere you create steps in /steps/common.js

  1. When I login as $username
  2. When I search for $text

Search results are displayed on their own page, so you add some more steps in /steps/search.js

  1. Then I should see $number search results

However you also have a dedicated search page, containing advanced search options. You create some additional steps in /steps/search.js

  1. When I select a category
  2. When I search for $text

This means "When I search for $text" is implemented twice. One implementation targets the search box in the header, the other targets the search box in the advanced search form. When two steps match, Yadda calculates the Levenshtein distance from the normalised step pattern and picks the one with the lowest score. If both steps score the same, it prefers the step from the current library. If that fails (because there was no previous step, or the current library doesn't contain one of the matching steps), then Yadda throws an Ambiguous Step error.

There are of course other ways to achieve this. You can select which libraries to load for each scenario, or you could make sure each step has a unique name. There are advantages and disadvantages to each approach.

In the case of dynamically loading libraries, I tend to do this by listing the libraries in an annotation added to the feature / scenarios, but always feel slightly uneasy referencing libraries from specifications

The approach of ensuring each step has a unique signature, means compromising on readability. Yadda's goal has always been to prioritise feature expressiveness, hence falling back to the heuristics approach when multiple matches are encountered.

lolmaus commented 6 years ago

So in your example the Levenstein's distance is zero, so the choice is made by looking at the fact that one of two candidate step implementations belongs to the current library. Heuristics is not involved.

Can you please provide a couple of real-life examples where candidate strings aren't identical and where heuristics is expected to kick in and be practically useful?


On another matter, you've now added the ^ to the beginning of the regex string-matching regex. You decided not to add $ to the end of the regex because (as I understood) it would've broken compatibility with arbitrarily phrased steps.

I'm using both ^ and $ locally and it works well for me. If I needed to allow random text in the end of the step, I could simply add .* to the end of the regex. It's simple enough, the intention is obvious, and this relaxed matching does not spread to other steps where I'm not expecting it.

I believe, $ should be added to Yadda for consistency. An exact match is the most expected behavior that guarantees least frustration to new users.

I learned about partial matching the hard way. Matching the beginning of a string with ^ is a less painful learning experience than a partial match, but it still has frustration potential which could (and should, IMO) be avoided.

cressie176 commented 6 years ago

So in your example the Levenstein's distance is zero, so the choice is made by looking at the fact that one of two candidate step implementations belongs to the current library. Heuristics is not involved.

Using the Levenstein's distance and favouring the current library are both heuristic approaches, but you are correct that in the example I gave the choice is finally made by looking at the current library. Both steps would have the same Levenstein's distance (although not a zero distance as the normalised pattern for the step would have been something like "I search for").

Can you please provide a couple of real-life examples where candidate strings aren't identical and where heuristics is expected to kick in and be practically useful?

The Levenstein's distance was important before I implemented dictionaries. Consider the following steps

Given a $gender, $specialty patient
Given a $gender patient

Without a dictionary definition these would resolve to the following patterns

Given a (.+), (.+) patient
Given a (.+) patient

Both patterns would match the following statement, leading to the Ambiguous Step error.

Given a male, cardiovascular patient

This was annoying because it's blatantly obvious to any human which step should have been chosen. One solution was to replace $gender and $specialty with enumerations

Given a (male|female), (cardiovascular|respiratory|elderly|gastroenterology|etc) patient

But as you can imagine the list got pretty long and was repeated in multiple steps. Levenstein's distance allows Yadda to guess which step should have been chosen, but I suspect it's largely redundant with the introduction of dictionaries.

I believe, $ should be added to Yadda for consistency. An exact match is the most expected behavior that guarantees least frustration to new users.

I'll give it some more thought.

lolmaus commented 6 years ago

I would solve the challenge you described like this.

Without dictionaries:

Given a ([^,]+), ([^,]+) patient
Given a ([^,]+) patient

With dictionaries:

  .define("item", /([^,]+)/) // matches anything except comma
Given a $item, $item patient
Given a $item patient

Advanced:

  .define("item", /([^,]+)/) // matches anything except comma
  .define("items", /(.+)/, function (input, next) {
    const items = input.split(/(,| and)\s+/); // split by "," or " and", followed by one or more spaces
    next(null, items);
  })
  .given("an? $item patient with $items", function (gender, syndromes) {
    mockServer.create("patient", {gender, syndromes});
  })
  Given a male patient with headache
  Given a female patient with anxiety, phantom limb syndrome and post-traumatic stress disorder

So I'm yet to see a use case where heuristics is necessary and beneficial.

PS On a side-note, checks like (male|female) are very useful as a guard against using elderly, recurring, etc where only gender is expected.

cressie176 commented 6 years ago

a heuristic, is any approach to problem solving, learning, or discovery that employs a practical method, not guaranteed to be optimal, perfect, logical, or rational, but instead sufficient for reaching an immediate goal.

Both the Levenstein's distance and Same library approaches are heuristic. I find benefit in the Same Library approach for the reasons I have given above. The Levenstein's distance was beneficial before I implemented dictionaries, because it meant more readable step definitions. i.e.

.define('Given a ([^,]+), ([^,]+) patient')
.define('Given a ([^,]+) patient')

is less readable than

.define('Given a $gender, $specialty patient')
.define('Given a $gender patient')

I've spent far more time debugging incorrect regexp than I have spent resolving problems due to the heuristics, so my preference is to keep regexp simple. I've also found that developers tend to have less exposure to regexp than other language features, which is another reason to keep them simple.

I might in future versions of Yadda consider removing the Levenstein's distance approach, but it's not something that personally has caused me any problems, nor something that has been the cause of any issues (this issue was originally caused by the lack of a ^). Its removal could cause problems for people updating from an older version of Yadda, therefore I have no plans to remove it in the short term.

lolmaus commented 6 years ago

I've spent far more time debugging incorrect regexp than I have spent resolving problems due to the heuristics

To avoid too broad regex matches, I use various delimiters. For example, these are real-life steps from my suite:

    Given an MCR with id: %1, state: `open` and project: @platform
    Given MCRs with:
      --------------------------------
      | Id    | State    | Project   |
      | @good | `open`   | @platform |
      | @bad  | `closed` | @platform |
      --------------------------------

The corresponding step implementation is:

const steps = {
  "given an MCR with $props"(props) {
    const params = parseProps(props, {
      id:        { regex: /%(\w+)/ },
      "state":   { regex: /`(open|merged|declined)`/ },
      "project": { regex: /@(\w+)/, type: "record" },  // type: "record" fetches actual relationships from the mock DB
      "author":  { regex: /@(\w+)/, type: "record", key: "creator" },
    });

    if (!params.creator) {
      params.creator = server.create("user");
    }

    return server.create("multi-code-review", params);
  },

  "given MCRs with:\n$table"(propsArrays) {
    propsArrays.forEach((props) => {
      steps["given an MCR with $props"].call(this, props);
    });
  },
};

feedStepsIntoYadda(steps);

Dictionary:

  .define("props", /((?:(?:(?: and|,) )?[^:,]+: [^:,]+)*)/, (str, next) => {
    const result =
      str
        .split(/(?: and|,)\s+/)
        .reduce((result, pair) => {
          const [key, value] = pair.split(/:\s*/);
          result[key] = value;
          return result;
        }, {});

    next(null, result);
  });

This framework (the $props dictionary entry and the parseProps utility function) lets me add more props to step name and implementation without having to adjust the regex.


my preference is to keep regexp simple

My problem with heuristics is that it can run an unintended step without warning, and resulting errors are extremely misleading, tricking you into spending hours debugging the app rather than the test suite.

With regular expressions and delimiters I have granular control over which steps have which amount of flexibility.

Heuristics, on the other hand, spans across all steps and can trip me when I don't expect it.

cressie176 commented 6 years ago

My problem with heuristics is that it can run an unintended step without warning, and resulting errors are extremely misleading, tricking you into spending hours debugging the app rather than the test suite.

I understand that you have had a recent and frustrating experience, although believe that was the result of the now fixed ^, rather than heuristics.

I hope that you can appreciate that my experience of heuristics has been positive. I've found it rare that the wrong step is called, and trivial to diagnose when it is. Typically a stack trace will point you at the failing line. It's easy to run just a single scenario, and easy to add logging to see which steps are being executed. i.e. in the rare case where the heuristics cause the wrong step to be executed, rather than triggering an Ambiguous Step error, spotting this is trivial.

At the risk of sounding like a broken record, Yadda's goal is to optimise the feature files for clarity. Minimising step clashes without compromising the language is therefore of the upmost importance. I find the Same Library useful in this respect, and will not be removing it. While the Levenstein's distance has been superseded by dictionaries, I'm am also not going to make a breaking change, that could result in a lot of existing users having to re-write their step definitions without what I consider to be a good reason.

I suspect my decision will disappoint you, however you may be relieved to know I am in the process of a ground-up re-write of Yadda, which resolves some of the limitations in this version. Regrettably this necessitate users' re-writing their step definitions (assuming they choose to upgrade). I am unlikely to include Levenstein's distance feature in the re-write, and will consider making heuristics optional. A first release is still some way off, but I will create an issue asking for beta testers near the time.

lolmaus commented 6 years ago

Typically a stack trace will point you at the failing line.

In my cases the stack trace always left me clueless. Typically, a tightly coupled seeding step had some kind of discrepancy. I ended up debugging my mock layer instead of looking into step mismatches.

It was even worse when the unintended step did not crash. In those cases I ended up debugging the app code, trying to figure out why assertions started failing.

I suspect my decision will disappoint you...

Nope! My intention was never to persuade you get rid of heuristics entirely. I was just challenging you to provide good real-life examples that would've helped me see what I was missing. 😇

But I did intend to have you consider:

you may be relieved to know I am in the process of a ground-up re-write of Yadda, which resolves some of the limitations in this version. Regrettably this necessitate users' re-writing their step definitions

OMG! 😱 Can you please reveal what kind of changes will be necessary?

cressie176 commented 6 years ago

The breaking changes I'm planning are...

I might also follow cucumber's example and removing the need to add a $placeholder for docstrings in the step definition too. I'm undecided on this, as I would need another way to specify a converter.

lolmaus commented 6 years ago

The breaking changes I'm planning are...

Nice!

I'm using a custom wrapper: writing steps in an object, then feeding it to an utility function that feeds the steps into Yadda.

I'll only need to update the utility function.

Rather than binding context to the step function, I'm explicitly passing this in as the first step argument

So this.ctx is currently passed as the first argument? What about step name and other stuff currently bound to this?

cressie176 commented 6 years ago

So this.ctx is currently passed as the first argument? What about step name and other stuff currently bound to this?

I think it makes sense to add those to the state too.

lolmaus commented 6 years ago

My boss has made a decision to remove Cucumber from the project's test suite. :sob:

The good part is that I'm now able to share my Yadda groundwork as open-source. Gonna call the npm package yadda-opinionated or something like that.

I now need to start a pet project ASAP so that I don't lose grip on Yadda, Cucumber and BDD.

cressie176 commented 6 years ago

My boss has made a decision to remove Cucumber from the project's test suite

After all your hard work :(

yadda-opinionated is a good name.

lolmaus commented 5 years ago

I've been experimenting with a shareable steps collection and noticed this.

When I appended my shareable steps collection to an app, it ended up having two similar steps:

.when('I visit $url', function() {})
.when('I visit URL $text', function() {})

For a step name When I visit URL /foo-bar, the former step is chosen, whereas I expect it to match the latter.

I'm not saying it's a bug, it's probably just Yadda heuristics not matching my mental model.

But I'm sad that I can't use the order of step definitions to control which step wins.

And not only Yadda is not supposed to take hints the order, nor the method chain API makes it easy to change the order. I'd prefer step collections to be composable as I had suggested earlier.

@cressie176 Do you think something can be done to make step choosing more predictable and controllable?

PS No reproach intended. I'm actively enjoying Yadda.

cressie176 commented 5 years ago

Hi @lolmaus. That's strange. I'll investigate.

lolmaus commented 5 years ago

I work around this by appending $ to the end of step pattern.

cressie176 commented 5 years ago

Works as expected for me. Have I misunderstood or is there some difference? Maybe you have dictionary definitions for $url or $text?

Code

var Yadda = require('yadda');
var English = Yadda.localisation.English;

var library = English.library()
  .when("I visit $url", function(url) {
      console.log('Former', url);
  })
  .when("I visit URL $url", function(url) {
      console.log('Latter', url);
  })

new Yadda.createInstance(library).run([
  'When I visit /foo-bar',
  'When I visit URL /bar-baz',
]);

Output

Former /foo-bar
Latter /bar-baz
cressie176 commented 5 years ago

Do you think something can be done to make step choosing more predictable and controllable?

I'd ideally like the scoring mechanisms to be injectable. That way you could disable the ranking and manually resolve ambiguous steps, select which scoring algorithms you wanted or potentially write your own. I'll give it some thought.

lolmaus commented 5 years ago

I'm not entirely sure, my setup is quite elaborate. There's clearly something different, since your minimal checks produce expected results while cases in my projects don't. I may be misusing Yadda, but I don't see what I'm doing wrong, and the indeterminate nature of Yadda makes me feel fruatrated.

I believe the issue relates to the fact that in your fix you added ^ in the beginning but decided not to add $ in the end. As I remember you wanted the steps to keep match when ending them with arbitrary text?

cressie176 commented 5 years ago

You're correct in that I deliberately didn't add the $ to the end of the pattern. I'm still not willing to make this breaking change.

The heuristics in this case really aren't that complicated, they just measure the number of changes that would be required to convert the step text into a normalised pattern.

The step text would look like

s1) When I visit /foo-bar s2) When I visit URL /foo-bar

The normalised patterns would look like

p1) I visit p2) I visit URL

The number of changes to convert s1 to p1 is 14 (When/foo-bar) The number of changes to convert s1 to p2 is 18 (WhenURL_/foo-bar) Therefore p1 is preferred

The number of changes to convert s2 to p1 is 18 (When_URL/foo-bar) The number of changes to convert S2 to p2 is 14 (When__/foo-bar) Therefore p2 is preferred

cressie176 commented 2 years ago

wont fix.