PowerShell / PowerShell

PowerShell for every system!
https://microsoft.com/PowerShell
MIT License
44.62k stars 7.22k forks source link

The chain operators don't work with flow-control statements exit, return, and Throw #10967

Closed mklement0 closed 10 months ago

mklement0 commented 4 years ago

A common idiom (in the Bash world, which inspired PowerShell's && and || operators) is to conditionally exit a script when invocation of a command fails, along the lines of:

# Assume existence of /somepath and exit, if it doesn't exist.
ls /somepath || exit 1

Currently, neither exit nor return nor throw can be used on the RHS of && / ||

Steps to reproduce

{ Get-Item /nosuch || return  } | Should -Not -Throw
{ Get-Item / && return  } | Should -Not -Throw

Expected behavior

The tests should pass.

Actual behavior

The tests fail, because return is not recognized

Expected no exception to be thrown, but an exception 
"The term 'return' is not recognized as the name of a 
cmdlet, function, script file, or operable program. ..."...

Environment data

PowerShell Core 7.0.0-preview.6
vexx32 commented 4 years ago

cc @rjmholt

rjmholt commented 4 years ago

Yes the PowerShell committee discussed this and it was decided against -- the RFC was updated to reflect that and it was taken out.

Given the syntactic complexity of it, I think it was actually the right decision. There was a lot of inconsistency in the edge cases of this proposal. Instead, you can use a subexpression pretty easily: Get-Item ./doesntexist || $(exit 1). It's not beautiful, but it also doesn't set up weirdness like Get-Item ./doesntexist || throw "More things" && "other stuff" &. At that point it was starting to get very jagged between correct usage and bad. So simplicity won out.

rjmholt commented 4 years ago

Also see:

https://github.com/PowerShell/PowerShell/blob/70ab772da77d8cc1cbec2e3aa29bd328ff1de589/test/powershell/Language/Operators/PipelineChainOperator.Tests.ps1#L92-L94

rjmholt commented 4 years ago

I did advocate fairly strongly for being able to use control flow statements in pipelines; I liked the idea and I think the concept is a nice one. Even though it didn't go in, I think it got a fair run.

But syntactically it doesn't work because the big difference between PowerShell and bash is that in PowerShell statements contain pipelines, whereas the opposite it the case in bash. Trying to shoehorn certain kinds of statement where a pipeline fits just made the syntax really unhelpful, and you'd either have to make dodgy syntax breaking changes or implement very fiddly syntax rules to massage it in.

The (badly named) subexpression syntax is designed for embedding a statement within a pipeline, so it offers this neatly.

mi-hol commented 4 years ago

I wonder how the fact "syntactically it doesn't work because the big difference between PowerShell and bash" can we made clear to users. I had assumed that this new feature would basically work the same in PowerShell as in bash, hence many others will very likely start with the same assumption. At least this conclusion seems 'another nail in the coffin' for the goal 'pwsh is a replacement to bash' :(

rjmholt commented 4 years ago

But it does work:

{ Get-Item /nosuch || $(return)  } | Should -Not -Throw
{ Get-Item / && $(return)  } | Should -Not -Throw
vexx32 commented 4 years ago

@rjmholt if that already works, I don't see it being especially sensible to mandate the awkward syntax.

rjmholt commented 4 years ago

Prior to pipeline chains, the grammar was (I’m using a simplified shorthand here):

statement:
 | ...
 | ‘return’ pipeline [‘&’]
 | pipeline [‘&’]

pipeline:
 | pipeline ‘|’ command_expression
 | expression

expression:
 | command_expression
 | ‘$(‘ statement ‘)’

command_expression: ‘Get-Item /‘

The current grammar just substitutes pipeline chains for pipelines, so that a pipeline is just a pipeline chain of length 1:

statement:
 | ...
 | ’return’ pipeline_chain [‘&’]
 | pipeline_chain [‘&’]

pipeline_chain:
 | pipeline_chain ‘&&’ pipeline
 | pipeline_chain ‘||’ pipeline
 | pipeline

pipeline: <as above>

If we allow control flow statements, then you might have this:

pipeline_chain:
 | pipeline_chain ‘&&’ chain_element
 | pipeline_chain ‘||’ chain_element
 | chain_element

chain_element:
 | ‘return’ pipeline_chain [‘&’]
 | pipeline

That gives us back gi / && return, but now we have:

The biggest question is “what do I expect those to do” and if there’s no clear answer it’s not good syntax. So let’s fix it.

Putting control flow statements only at the end of chains fixes nothing because they already enforce themselves as the ends of chains; once you use one, the chain to the right is subordinate to return because they have a pipeline_chain under them in the grammar.

So let’s fix it instead by only allowing unchained pipelines under return. But now you can’t do return gi file && cat file.

Ok we can fix that if we allow chains when return is the start of the statement but not when we’re in a chain. But now we have two kinds of return statement and pipelines aren’t just degenerate pipeline chains, they have all these caveats.

And we still haven’t addressed how to deal with return 1 && 2 & vs 1 && return 2 &.

And we also still haven’t addressed how to deal with control flow statements in places where only pipelines working as expressions used to be expected.

In my original implementation that last was addressed by having two kinds of AST, so that return by itself wouldn’t just start working I’m an if or a while. But that meant tool implementors overriding VisitPipelineChainAst wouldn’t hit all the things that looked like &&/||, they also needed VisitStatementChain. Which was badly named since it didn’t apply to all statements. But then to make it apply to more statements we’d need to make even bigger changes in the syntax around job control and statement separation. And with each change we risk breaking compatibility with PS 5.1 and becoming more like Raku.

So it’s a bit like the conversation Roy Batty has with Eldon Tyrell; every attempt to save the situation has some ugly consequence. To save that one tree we have to move the forest around it, and we’ve introduced more problems. A programming language’s grammar is a human interface that must be above all else consistent and composable. Inserting chaotic corner cases into the grammar makes it impossible to reason about or teach. And PowerShell needs to be defensible in 5 or 10 years time.

vexx32 commented 4 years ago

Right, but if it's already possible to make it "work" by using subexpression we should probably make an effort to formalize it in a way that is reasonably intuitive. Otherwise, the behaviour of such subexpressions will always be pretty ambiguous, unless you intend to completely disable keyword use in a pipeline chain.

SteveL-MSFT commented 4 years ago

An alternative is to make sure we include flow-control subexpressions in the examples for the documentation of this operator. Sub expressions is an existing concept and maybe users should use it more.

mi-hol commented 4 years ago

@SteveL-MSFT, adding this to the documentation would be great! A search for the term gives 0 results currently!

image

or unrelated hits image

SteveL-MSFT commented 4 years ago

@PowerShell/powershell-committee reviewed this and we agree to that we will stay with what we agreed in the RFC which is that supporting flow-control statements complicates the implementation and using sub-expressions is the right way to use this. We should improve our documentation on sub-expressions if PowerShell users are not using them.

mklement0 commented 4 years ago

@SteveL-MSFT, a quick tangent, though I believe it reinforces the point that we shouldn't force the awkward $(...) syntax (sorry for being late here).

Sub expressions is an existing concept and maybe users should use it more.

If anything, users should be using the (unfortunately named, as @rjmholt pointed out) sub-expression operator ($(...)) less:

rkeithhill commented 4 years ago

It is (...) that deserves promoting, but for that it needs a name

This is from the early days of PS but IIRC (...) was referred to as a grouping expression, $(...) as a subexpression and @(...) as an array subexpression.

mklement0 commented 4 years ago

Thanks, @rkeithhill.

The latter two definitely officially have that name in about_Operators - with suffix operator - but (...) isn't covered there and I wasn't aware of a name for it - is that name in any existing documentation? A quick search revealed nothing.

By coincidence, grouping operator is what I'm proposing in https://github.com/MicrosoftDocs/PowerShell-Docs/issues/5154.

rkeithhill commented 4 years ago

It was called that in the first edition of Bruce Payette's "Windows PowerShell in Action" section 5.3 (page 119). And it's the same in the second edition (page 156).

rkeithhill commented 4 years ago

Hmm, in the second edition, it looks like "grouping expressions" refer to the set of three operators. It goes on to say:

Parentheses group expression operations and may contain either a simple expression or a simple pipeline. They may not contain more than one statement or things like while loops

For subexpressions it says:

Subexpressions group collections of statements as opposed to being limited to a single expression.

So maybe it is just parentheses operator? You might want to read through section 5.3 to see what you think.

mklement0 commented 4 years ago

@rjmholt

Thank you for putting up a good fight and for laying out the rationale.

Before I even try to understand the intricacies of the grammar, let me make a plea purely from a user's perspective:

If we force the use of $(...):

As an aside: The same arguments apply to the unfortunate decision to require {...} around identifiers in order to be able to use the null-conditional member-access operator, ?..

So, for the sake of the end-users' experience, can we look for a solution that works as expected, is easy to explain, and technically not too tortu[r]ous?

Personally, I'm not worried about cases such as Get-Item ./doesntexist || throw "More things" && "other stuff" &, because I wouldn't expect many people to attempt such acrobatics.

mklement0 commented 4 years ago

I appreciate the pointers, @rkeithhill: I found "simple parenthetical notation" in that chapter for (...) specifically, and "expression and statement grouping operators" indeed seems to be the umbrella term.

In other words: (...) currently has no name, so we can and should pick one going forward: parentheses operator is problematic, because it doesn't reflect the operator's purpose, only its syntactic form.

I encourage you to join the conversation at https://github.com/MicrosoftDocs/PowerShell-Docs/issues/5154 to either endorse the suggested grouping operator name or to suggest alternatives.

rjmholt commented 4 years ago

Before I even try to understand the intricacies of the grammar, let me make a plea purely from a user's perspective

I very much understand and even sympathise, but the problem is that the grammar is the language. It's our most fine-grained API and also our primary user interface. PowerShell has pipelines contained by statements, not statements contained by pipelines. That's always been the case, and the way to put a statement within a pipeline is to use the $(...) syntax — that's its only purpose to exist outside of strings in the language. In addition to that being a huge change in PowerShell's grammar (effectively creating a fork in the language), there's no semantic support in PowerShell for general statement success; $? after if or while has no meaning.

Essentially, with such a syntactic change you're asking for a new language, with a different treatment of syntactic and semantic constructs like expressions, pipelines and statements.

this seemingly artificial requirement will be an ongoing source of frustration

Just to reiterate, it's not artificial. This isn't a case of us setting the rules because of an opinion. Programming language syntaxes are formal languages with specific requirements.

In our case, PowerShell is (mostly...) a traditionally LL(k)-parseable grammar (like most interpreted languages, since this is a big asset when you're reading the program on the fly). As the parser proceeds from left to right in the program, it decides what syntactic element it sees based on the state it's currently in (captured in our recursive descent parser mostly by the method we're in) and the element it sees right now. As the parser proceeds within each state, it has some way of deciding when to move to the next state or when to return to the state it came from. For example, in if (...) { ... }, it's expecting a statement at the outset and sees if, which in the "expect statement" state makes it transition to the "if statement state". Just as important is the final } at the end. It signals that we have finished seeing the if statement, and can now look for the next statement at that level. Every syntactic element in PowerShell (and in every formal language) has some way to declare "end" (think ), }, ;, <newline>, <end-of-file> -- just like </div> in HTML).

The problem arises because pipelines, being the default form of statement, terminate the same way as arbitrary statements. So when you embed a statement within a pipeline, it's not clear what you're terminating, unless you have a syntax to delineate the extent of the statement. That's precisely the purpose of $(...); when ) is seen, we go back to expression parsing. Without that (or some other syntax to wrap statements within pipelines), there's no terminator to say "ok statement's done, back to the pipeline state". So instead, you get a deepening syntactic hole into embedded statements (e.g. action1 && return action2 && return action3, which is grouped as action1 && return (action2 && return action3)); there's no way to indicate syntactically that the top level pipeline chain should be continued.

That's true for any implementation that removes a container syntax for statements in pipelines or pipeline chains, so it's not an implementation detail. In fact, if PowerShell used an LR parser, it would be a major issue in the parser (a reduce-reduce conflict). To quote that link from the GNU Bison docs:

A reduce/reduce conflict occurs if there are two or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar.

This is particularly important because syntaxes aren't a concept required by computers (those do fine with finite-block input like byte code and machine code). Syntaxes are a human interaction requirement, and a flaw in the syntax is as much a usability issue as it is a technical one. We could write the parser to make a call on the grammatical ambiguity (like C does), but it would still be confusing to users to pick apart write-error "bad" && return 0 || exit 1.

The same arguments apply to the unfortunate decision to require {...} around identifiers in order to be able to use the null-conditional member-access operator, ?..

I agree that the scenario is similar, but also think there that not breaking the way we parse existing valid syntax was the right way to go. Again, a change there isn't just a case of us making a decision to support a small number of wild programs out there — we've made numerous breaking changes in cases where we thought the pros outweighed the cons (not that I personally agree with all of them, or that those do or don't justify others). The issue is that changing some aspect of the tokenizer or parser means that two PowerShell versions will no longer generate the same AST for some scripts, so two PowerShells will "see" the same script differently. That means we can't even read the syntax the same way, so there's no PSScriptAnalyzer rule you can write to pick up possible issues around it; PSScriptAnalyzer will see a different syntax from one PowerShell version to the next.

Unlike C#, PowerShell can't pre-compile a different syntax to a compatible format for later execution, meaning that a syntactic change is tethered to the runtime. Once PowerShell makes a syntactic break, we reduce the number of scripts that can work against different versions, meaning users are forced to write two scripts, which is a serious issue for a shell and a scripting language. PowerShell is supposed to be dynamic enough to bend past all the differences with logic inside one script, but the syntax is the one non-negotiably static thing we have. And making a syntax change where both before and after are valid is especially pernicious, since there's no simple way to detect it, there's no warning for it and even after executing it in both environments, users might not know that a different behaviour occurred.

mklement0 commented 4 years ago

I appreciate the thoughtful, in-depth answer, @rjmholt.

Re the ?. syntax issue, I suggest we continue the conversation at https://github.com/PowerShell/PowerShell/issues/3240#issuecomment-565745004, where I've quoted the relevant part of your answer in my response.

mklement0 commented 4 years ago

Quoting from your earlier comment:

So let’s fix it instead by only allowing unchained pipelines under return. But now you can’t do return gi file && cat file.

I don't see that as a problem, because I see little value in being able to do that. (Bash users won't attempt it anyway, because to them return only ever takes a numeric argument - a function's exit code).

Telling users that return only ever works on a single pipeline makes for a straightforward rule and means that return gi file && cat file is a pointless command, because the cat file command never gets to execute.

In fact, that's exactly what happens in Bash:

$ foo() { return 0 && ls /; }; foo
# no output; exit code is 0

Telling users that they must use $(...) if they want to return an entire chain strikes me as perfectly reasonable:

# To return a *chain*, you must use $(...)
return $(gi file && cat file) 

Requiring $(...) for this exotic case seems much more reasonable that requiring it for the simple and much more common foo || return case.

Of course, you can avoid $(...) altogether if you simply don't use return or simply make it the next statement:

gi file && cat file
return # only needed if there are additional statements in the scope.

And we still haven’t addressed how to deal with return 1 && 2 & vs 1 && return 2 &.

Again, if return only ever operators on a single pipeline, it's easy to reason that this particular chain is pointless, because the first chain segment exits unconditionally.

And we also still haven’t addressed how to deal with control flow statements in places where only pipelines working as expressions used to be expected.

Can you say more about that?

mklement0 commented 4 years ago

@rjmholt , after some more conceptual fog lifted for me based on your feedback and the discussion in #6817, I've scrapped the previous comment['s proposal].

Let me try to bring some closure to this conversation:

I think all conceptual problems you raise in https://github.com/PowerShell/PowerShell/issues/10967#issuecomment-549167463 would go away - and with that any potential confusion for users - if we limited the scope of the operand for return / exit/ throw to a single pipeline (as before, by default, given that there were no chains) and - now - also to a single pipeline chain element (link); ditto for postpositional & (background operator).

In short: a pipeline chain is then a chain of pipelines that may each be preceded by return / exit / throw and may each be followed by & - other than chaining such pipelines with && and ||, there's no syntax that pertains to the chain as a whole.

It's only worth considering feasibility if we agree that the above is the desirable behavior.


Here's what the proposal might look like in terms of a grammar if there weren't backward-compatibility concerns.

I do realize that these concerns are paramount, I just can't personally answer the question whether there is a non-breaking way to implement this without the solution becoming too complex and cumbersome from a code-maintenance perspective.

statement:
 | ...
 | pipeline_chain 

pipeline_chain:
 | pipeline_chain ‘&&’ chain_element
 | pipeline_chain ‘||’ chain_element
 | chain_element

chain_element:
 | ‘return’ pipeline [‘&’]
 | ‘exit’ pipeline [‘&’]
 | ‘throw’ pipeline [‘&’]
 | pipeline

That is, return / exit / throw would become part of a chain element (link) only, and postpositional & would similarly only be allowed per chain element.

The only conceivable downside is one you've already mentioned: you'd then be able to write the following non-sensical code:

if (exit 1) { ... } # will unconditionally exit the enclosing script - block is never entered
exit 1 && ...   # ditto - && ... is effectively ignored

But I wouldn't consider that a problem, given that it's fair to assume that users know the meaning of return and exit and throw; there's generally never a guarantee that a syntactically valid statement [sequence] isn't semantically pointless in practice (e.g., if ($false) ... or exit 1; ...).


So, the - at this point potentially rhetorical in effect - question is:

Can the above be implemented:

yecril71pl commented 4 years ago
0 && return 1

RETURN: The term 'RETURN' is not recognized as the name of a cmdlet, function, script file, or operable program.

0 && 1

0 1

Should I understand that 1 is not a term? I would rather say that RETURN is not.

SeeminglyScience commented 4 years ago

@yecril71pl I'm not sure I understand the question. Can you elaborate?

yecril71pl commented 4 years ago

Why is 0 && 1 not in error? If 1 is a term, then, according to the error message reported by 0 && return 1, it must be "the?? a name of a cmdlet, function, script file, or operable program" (note that the definite article is definitely out of place there). But 1 is none of them, at least at my place, so I conclude it must not be a term, or otherwise the error message would equally apply to it.. Also, what does an "operable program" mean? An operable application, that one I know: is an event-driven application that responds to input from the operator. If it stops responding, it becomes inoperable, a.k.a. hangs. I still do not know what an "operable program" is supposed to mean.

SeeminglyScience commented 4 years ago

Why is 0 && 1 not in error?

So in that parsing mode any CommandBaseAst is allowed. That includes:

  1. A command, like Get-ChildItem -Recurse
  2. A command expression, which is basically a wrapper for any expression.

The term expression here refers to a specific subset of language elements. In this case, it's a constant expression (the literal integer 1). Not included in this subset are statements. Statements are another subset that include things like if, return, foreach etc.

So basically because statements cannot be parsed, the snippet return 1 is parsed as a command with the literal 1 as an argument.

Also, what does an "operable program" mean? An operable application, that one I know: is an event-driven application that responds to input from the operator. If it stops responding, it becomes inoperable, a.k.a. hangs. I still do not know what an "operable program" is supposed to mean.

It just means an executable file found by command discovery. In Windows it means a file in the Portable Executable format, or just anything that can be invoked with the shell verb "Invoke" (which is pretty much any file system item).

yecril71pl commented 4 years ago

The term expression here refers to a specific subset of language elements.

The error message I get gives me a specific subset of that subset, and that subset does not cover 1.

So basically because statements cannot be parsed, the snippet return 1 is parsed as a command with the literal 1 as an argument.

I do not think it should be parsed as a command because RETURN is not a bareword, it is a keyword, and it cannot begin a command.

Also, what does an "operable program" mean? An operable application, that one I know: is an event-driven application that responds to input from the operator. If it stops responding, it becomes inoperable, a.k.a. hangs. I still do not know what an "operable program" is supposed to mean.

It just means a command found by command discovery. Usually an executable application in %PATH%, function, binary cmdlet, alias, or script file.

So PowerShell has its own definition what "operable" means. We should definitely change that.

SeeminglyScience commented 4 years ago

The error message I get gives me a specific subset of that subset, and that subset does not cover 1.

Not sure what you mean. If you'd like clarification on something I've said feel free to ask.

I do not think it should be parsed as a command because RETURN is not a bareword, it is a keyword, and it cannot begin a command.

Anywhere where statements are not allowed it will be parsed as a command. If you'd like to know more about why statements are not allowed in this syntax, please read through the rest of the thread. After that if you have questions about a specific part feel free to ask.

So PowerShell has its own definition what "operable" means. We should definitely change that.

That's out of scope for this issue. If you'd like to discuss that I'd recommend opening a new one.

yecril71pl commented 4 years ago

I do not think it should be parsed as a command because RETURN is not a bareword, it is a keyword, and it cannot begin a command.

Anywhere where statements are not allowed it will be parsed as a command. If you'd like to know more about why statements are not allowed in this syntax, please read through the rest of the thread. After that if you have questions about a specific part feel free to ask.

I think it is a wrong thing to do, UX-wise. I would expect 0 && return 1 to fail even if there is some return.exe in the executable search path.

SeeminglyScience commented 4 years ago

I think it is a wrong thing to do. I would expect 0 && return 1 to fail even if there is some return.exe in the executable search path.

Yeah I agree. Problem is that logic is very heavily relied on with the foreach alias pointing to ForEach-Object. That logic can't be changed without breaking a lot of folks.

yecril71pl commented 4 years ago

So, returning to the problem at hand, if we allow && return then we should also allow && foreach, and then we rely on the fact that nobody would use Foreach-Object as the first pipline element, amirite? But what about nutjobs that would write 0 && foreach { $_ } -I 0 , and then they are broken?

SeeminglyScience commented 4 years ago

So, returning to the problem at hand, if we allow && return then we should also allow && foreach, and then we rely on the fact that nobody would use Foreach-Object as the first pipline element, amirite?

Believe it or not plenty of folks do. Especially if they are taking a ScriptBlock from an external source and want to emulate a process block. There are better ways to do it, but using ForEach-Object in the first pipeline slot is a pretty common way to do it.

0 && foreach { $_ } -I 0 , and then they are broken?

You got it. Not only would that no longer work, it would be a parse time error :/

microsoft-github-policy-service[bot] commented 10 months ago

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

microsoft-github-policy-service[bot] commented 10 months ago

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

microsoft-github-policy-service[bot] commented 10 months ago

This issue has not had any activity in 6 months, if this is a bug please try to reproduce on the latest version of PowerShell and reopen a new issue and reference this issue if this is still a blocker for you.

microsoft-github-policy-service[bot] commented 10 months ago

This issue has been marked as "No Activity" as there has been no activity for 6 months. It has been closed for housekeeping purposes.

amis92 commented 10 months ago

It's definitely not completed...