cucumber-attic / gherkin

Cross platform parser for the Gherkin language. Used by Cucumber to parse .feature files.
252 stars 127 forks source link

Bring it closer to gherkin2 grammar #153

Closed aslakhellesoy closed 8 years ago

aslakhellesoy commented 8 years ago

See details here: https://groups.google.com/forum/m/#!msg/cukes/SdEJpBG4twQ/EnIRAhp8EgAJ

mattwynne commented 8 years ago

So @enkessler can you tell us some specific examples of what's missing? Your example in the google group seemed to refer to a Scenario: with no steps...?

aslakhellesoy commented 8 years ago

Actually, individual tickets for each incompatibility would be ideal (with an example to reproduce). Then we can close this ticket.

enkessler commented 8 years ago

I suppose that I could dig out the examples that I use in my own test suites and probably come up with a few more but, if you want to get back to the gherkin2 grammar, why not simply grab the grammar out of source control and reuse it? Is it not in handy form like the current grammar is and would have to be reverse engineered? I seem to recall state machines being involved.

brasmusson commented 8 years ago

We do not want to get back to the gherkin2 grammar. One deliberate deviation which comes to mind (which is backward compatible BTW) is to allow comment anywhere in the feature file (which is why they in the new AST all are associated to the feature).

enkessler commented 8 years ago

Oh, yeah. I noticed that. Now comments at the end of a file won't get lost for want of some object to attach them to like they used to. :)

So would it be fair to say that you are aiming not for something similar to the previous grammar but for something that is more flexible in general? After all, the gherkin gem can likely parse any reasonably sane feature file into something meaningful. And any upstream consumer can decide what is actionable or not (e.g. Cucumber can decide that a stepless test doesn't need to be run).

I can crank out an issue for each feature file that is potentially 'valid'. Some of them may require an update to the current grammar, some of them may already work and could just use a test (I only saw 6 tests while I was working on #155), and some of them may be thrown out entirely depending on where the line of 'usefulness' is drawn.

aslakhellesoy commented 8 years ago

That would be very helpful. Keep in mind that most tests are cross platform and live under /testdata. They are run with make

enkessler commented 8 years ago

I found them. I can't run them yet because, you know, Windows user and all that, but the tests look like straightforward input/output combinations. So, I get the gist of it.

Funny thing: several of the examples 'bad' input files are exactly the kind of use cases that I would be suggesting.

The guideline that I am thinking of is 'the ability to stop at any depth. so long as no levels were skipped on the way down'. So, scenarios without steps, examples without rows, or even an entirely empty file would be okay but things like a tag attached to nothing or a bunch of scenarios without ever declaring a feature would not be okay.

paoloambrosio commented 8 years ago

I'm not sure I agree with a feature file missing the "Feature:" header.

aslakhellesoy commented 8 years ago

@enkessler this sounds reasonable. We can have a crack at it. If it's not too much work I'm open to implementing this.

Keep in mind that it might be a several day job - there are 7 language implementations to change.

brasmusson commented 8 years ago

A quick look in gherkin.berp indicate that basically the only problem is with Scenario Outlines. Currently if you write Scenario Outline:, you have to write Examples:, and if you write Examples:, you have to write two table rows.

Apart from that, the other issue I see is whether an empty file should parse OK or not.

paoloambrosio commented 8 years ago

It should not be a problem for the users if Scenario Outline: requires an Examples: section if we allow an empty example table. Changing the example from the email thread:

  Scenario Outline: Variations on awesomeness
    Examples: We'll think of other kinds of awesome after the team happy hour 

Enforcing to have at least one empty Examples: section would make it a bit more difficult to create Scenario Outlines that were meant to be Scenarios. I'd like this to fail:

  Scenario Outline:
    Given I wrote Scenario Outline instead of Scenario
    And I don't have an Examples section
    When I run Cucumber
    Then it will be skipped
    And I won't get any error
enkessler commented 8 years ago

I wouldn't mind and would also find it helpful if something warned me about a potential mistake. However, I am of the opinion that the language itself is not that thing.

The lower it's place on the tool chain something is, the less opinionated it should be. Cucumber or SpecFlow throwing an error or warning because it was told to execute something that looked suspicious? Perfectly okay. The gherkin gem warning me about what I've written? That's rather proactive of it and could get noisy if there's not an option to turn it off but it's kind of okay. The Gherkin language not letting me write something suspicious in the first place? That's a limitation that fundamentally conflicts with its purpose.

paoloambrosio commented 8 years ago

@enkessler you convinced me :+1:

brasmusson commented 8 years ago

I created a proposal for allowing incomplete Scenario Outlines: #160.

enkessler commented 8 years ago

Good news! I finally have a machine with some flavor of Linux on it and seem to have successfully installed all of the prerequisite tools that are needed to build this project and contribute to it properly. That being said, I'm getting errors when trying to actually do that.

Would any of you have some time to help me get set up and in motion?

aslakhellesoy commented 8 years ago

Sure! Just hop on the gherkin gitter channel.

enkessler commented 8 years ago

I'll be hanging out in there today.

aslakhellesoy commented 8 years ago

Most of the core team will only be there during the work week.

enkessler commented 8 years ago

As in Monday - Friday, or 9-5? Because I don't think that my employer wants me fiddling with thus stuff while on the job.

aslakhellesoy commented 8 years ago

9-17 UK time Mon-Fri. Our family lives don't permit work outside these ours I'm afraid.

aslakhellesoy commented 8 years ago

Is there anything from this issue that hasn't been captured in other, smaller tickets?

enkessler commented 8 years ago

I'm slowly making it through all of the build problems. So far I've had to install (in addition to what is listed in the provided setup guide) 'nose' for python and the Go language. Now the build wants 'xcodebuild' and, from what I'm seeing from some internet searching, that's an OSX only thing. I thought that I needed some flavor of Linux for this project?

aslakhellesoy commented 8 years ago

The Objective-C code needs xcodebuild, which I don't think is available on Linux yet. OS X is probably the obly OS that can build all 8 languages.

Just skip the Objective-C build if you can't build it.

enkessler commented 8 years ago

Okay. I've finally got this thing humming along for everything but Objective-C and I'm taking a crack at adding test data for incomplete scenarios in a similar fashion to the incomplete outlines that were done in #160.

I was excited until I realized that outlines had all of the cool use cases and the only 'incomplete' scenario case is one with no steps. A copy, paste, and tweak later and I had test files ready to go and, unsurprisingly, scenarios without steps are already allowed. So these new test files don't force anything to change so much as they document the already met expectation and prevent it from being broken in the future.

Except for the pickles...

Maybe I'm not 100% clear on what pickle are supposed to be. When dealing with 'incomplete' objects , why is an AST and token set produced (albeit with less data) but no pickles are made? Should this be different between scenarios and outlines?

aslakhellesoy commented 8 years ago

Hi @enkessler.

I'm taking a crack at adding test data for incomplete scenarios...

Additional testdata documenting what already works is most welcome.

Maybe I'm not 100% clear on what pickle are supposed to be

Pickles are a data structure designed to simplify future Cucumber implementations. From these relatively simple structs, Cucumber will produce test cases, which "weave" in step definitions from the user.

Take a look at microcuke for an example of how that's done.

When dealing with 'incomplete' objects , why is an AST and token set produced (albeit with less data) but no pickles are made? Should this be different between scenarios and outlines?

The AST is simply a representation of the parse tree, so that should contain "incomplete" nodes.

However, the compiler should probably not produce a pickle for a scenario without steps. Or maybe it should..

The answer to that question depends on how we want such scenarios to be reported by Cucumber. If we want a Scenario without step to be reported, we need to provide an empty Pickle for it. Otherwise Cucumber won't know about it. (Future Cucumber implementations only know about pickles, not about the AST).

Thoughts?

enkessler commented 8 years ago

I try to design upstream tools without special functionality that plays favorites for or, preferably, even has knowledge of any particular downstream one. That's what library extensions are for, you know? So, I'll just pretend that when you say Cucumber you mean "Any tool that wants to utilize our consumables". ;)

Looking over microcuke and a compiler, it looks like pickles are meant to be a 'finalized' test. The extra steps and tags inherited from the background/feature are combined with those of the scenario/outline, and the template form of outlines are used to make multiple 'concrete' tests. Unless I am missing something, that's basically it. (As an aside, I am relieved that the existence of pickles does not appear to make my gem redundant.)

If the purpose of the pickles is simply to 'condense' the expanded form of a test into a single object, then my inclination would be to follow the same principles as we did with the AST and not limit downstream tools. So the compiler would go ahead and pickle anything that it could (incomplete or otherwise) and, once again, it is up to the consuming tool to do with that information what it wishes, only now with fewer objects to dig through in order to find that information.

In any case, because an outline is just a template that produces one scenario for each example row, I would suggest treating scenarios and outlines the same way. If we don't make a pickle for a scenario with no steps, then we don't do it for an outline either. If we do make a pickle for test with no steps, then we make 50 'concrete' pickles for an outline with no steps but 50 example rows.

aslakhellesoy commented 8 years ago

Gherkin's pickles is a consumer driven contract. The consumer is Microcuke, and later, other Cucumber implementations.

I think of them as the simplest possible data structure that is needed to create a test. Microcuke's TestCases are built from pickles and user-supplied step definitions.

In other words, if this is the simplest structure Microcuke can work with, then Gherkin should provide that structure. You'll find similar principles in other patterns like CQRS where read-models are optimised for consumers.

I haven't thought about or tried to design pickles to have any use or purpose for other tools than Microcuke. I don't think that would be a good idea.

If you have a suggestion for how Microcuke can be improved by making changes to the compiler and/or pickles, perhaps you could demonstrate with a pul request? Changing a single implementation is sufficient for a discussion.

enkessler commented 8 years ago

So, pickles are a feature that was added at the behest of one of our consumers. However, it is not so specific to them that they would not be of use to other future consumers. And the pickles are a CDC, so we don't want to proactively change them (e.g. by adding incomplete pickles) but rather wait until a real consumer has a real need.

...Okay! I'm on board with this.

I would also have accepted the reasoning that the pickles are not just convenient data storage objects as I had first envisioned but were, in fact, meant to represent a test. A test without steps isn't really a test and therefore we would not make a pickle for it.

I don't have a solid opinion how Cucumber should handle/report incomplete stuff, so I'll leave further arguments on changing pickle shape and usage to people who do.


Two interesting pickling cases come to mind:

  1. A scenario/outline with no steps but present with a background that does have steps
  2. An outline with steps but none of those steps use the parameters provided in the examples

In both cases, you would end up pickling a bunch of tests that do the exact same thing. Not that that's necessarily bad, it's just an interesting consequence of sticking the same data into a template over and over again.

aslakhellesoy commented 8 years ago

Thinking about it I think a Scenario without steps should not produce a pickle. Cucumber can still report it as undefined.

1 is interesting! I think it should actually not produce a pickle.

2 I think we should just allow to produce identical pickles.

enkessler commented 8 years ago

Currently we are pickling stepless scenarios but not stepless outlines. I'll get a PR going this evening that doesn't pickle either and will at least have Ruby updated.

enkessler commented 8 years ago

I'm working on incomplete backgrounds and features now.

I ran into a surprise while doing a description and no steps. Given gherkin similar to the below, the empty line between the description and the scenario tokenizes as an 'other' line instead of an empty line. Shouldn't it be an empty token?

Background:
  short description

Scenario:
  * etc.
enkessler commented 8 years ago

Just ran into a feature file that is nothing but comment lines. Glad to see that my sometimes contrived use cases actually show up 'in the wild'. Now to figure out how to adjust the parsers...

enkessler commented 8 years ago

And the answer to 'how do I adjust the parsers?' appears to be 'change the grammar and regenerate them'. I'm taking a crack at it but I suspect that I'll end up needing to rewrite the grammar entirely in order to make sense of it and/or account for the removal of some of the assumptions that are allowing it to work in its current manner (e.g. assuming that a Feature will always exist prevents the need for additional look ahead rules for tags).

Two questions for you lot: First, would you like me to make a grammar that works for the new tests and then someone can refactor it in the PR into something prettier/more performant or would you rather I just PR the test data and leave it to someone who knows better to update the grammar? Second, is there a handy description for the notation used in the grammar? I passingly familiar with context-free grammars and BNF but haven't figured out what the '!' after some of the rules is supposed to mean.

enkessler commented 8 years ago

Right, so...after further thought, a feature file that has only comments has never really had a place in Gherkin before. As I recall, they have always had to hang off of something. In Gherkin 2, the last comments in a file would always be lost because no other objects got parsed upon which to stick them. In Gherkin 3, a Feature is mandatory and all comments in a file are stuck on that Feature. However, comments are likely meant to refer to the thing that they are near and not the feature as a whole so, while that's a reasonable workaround in order to not loose them, it's harder to figure out what the comment was supposed to refer to without doing line number math.

Keeping track of comment lines does seem to be desired, but no version of Gherkin has successfully stuck them in a place that is both reliable and meaningful. Given the move towards super minimal feature files, I think that the only thing that will exist 'for sure' is the file itself (even if it is empty aside from the aforementioned comments). It doesn't get us any more meaningfulness than sticking them on the feature but it is now more reliable.

The significant change that I see resulting from this would be adjusting the AST so that it can exist and hold stuff but without a feature. That is, add a 'file' level to the AST.

Example of current AST:

{
  "comments": [ <comments go here>],
  "keyword": "Feature",
  "language": "en",
  <etc...>
}

Example of new AST:

{
  "comments": [ <comments go here>],
  "feature": {
                    "keyword": "Feature",
                    "language": "en",
                    <etc...>
                 },
  <anything else that might be appropriate to stick at the file level>
}

Example of an AST for a file with only comments:

{
  "comments": [ <comments go here>]
}

Example of AST for a completely empty file:

{
}

Your thoughts?

aslakhellesoy commented 8 years ago

@enkessler see #189

brasmusson commented 8 years ago

Following up on https://github.com/cucumber/gherkin/pull/189#issuecomment-202929266.

Does anybody see anything more to be done in this issue, after #160 (scenario outlines without examples), #175 (only background steps) and #189 (empty feature files)? I do not find anything from reading the discussion in this thread.

If there is not more, we should close this with reference to those PR:s.

enkessler commented 8 years ago

I think that incomplete Backgrounds and Features are all that is left. Unless I stumble upon something while adding 'good' test data examples (which is what led to #189). Seeing the super simple grammar change that allowed for empty features, I should be able to now finish those two use cases.

enkessler commented 8 years ago

Still an open question on an earlier comment.

brasmusson commented 8 years ago

@enkessler About your question in https://github.com/cucumber/gherkin/issues/153#issuecomment-198703247,

the empty line between the description and the scenario tokenizes as an 'other' line instead of an empty line. Shouldn't it be an empty token?

It is intentional that both empty lines in the middle of an description and after an description is tokenized as 'other'. It is to avoid lookahead, see https://github.com/cucumber/gherkin/blob/master/gherkin.berp#L44.

Do you see a large enough benefit in tokenize them as 'emtpy', to motivate the extra lookahead that is needed in that case?

enkessler commented 8 years ago

Just consistency. I thought that the 'other' token was a 'catch all' for stuff that we don't otherwise categorize, but we do categorize empty lines, so I would be expecting an 'empty' token.

That being said, I don't want to tackle that right now so I'll leave it alone.

brasmusson commented 8 years ago

I think that incomplete Backgrounds and Features are all that is left.

A quick check does not reveal any problems with the current behaviour with respect of incomplete Backgrounds and Features. The one thing could be done seems to be to add acceptance test data that specifies the current behaviour.

enkessler commented 8 years ago

Yep. That's the thing that I'm doing now. I'll have a PR soon.

enkessler commented 8 years ago

Got another PR. #194 should cover the testing bases and we can do the rest of the implementations in #189.

enkessler commented 8 years ago

I think that this is ready to be closed as soon as #192 gets merged.

Edit: ...which I will go and update now that the AST is shaped differently...all done!