att / ast

AST - AT&T Software Technology
Eclipse Public License 1.0
556 stars 152 forks source link

Empty removal of quoted string within parameter expansion #1006

Open McDutchie opened 5 years ago

McDutchie commented 5 years ago

Description of problem: Erroneous empty removal of a quoted empty substitution string within a parameter expansion.

Ksh version: Version A 2017.0.0-devel-2034-g6542e5a

How reproducible:

build/src/cmd/ksh93/ksh -c 'unset var; set -- ${var-""}; echo $#'
build/src/cmd/ksh93/ksh -c 'var=; set -- ${var:-""}; echo $#'
build/src/cmd/ksh93/ksh -c 'var=; set -- ${var+""}; echo $#'
build/src/cmd/ksh93/ksh -c 'var=x; set -- ${var:+""}; echo $#'

Actual results: 0

Expected results: 1 (as on every non-ksh93 shell)

krader1961 commented 5 years ago

Note that this behavior dates back to at least 2012 with ksh93u and is also present in the ksh93v beta release. So this behavior was not introduced recently.

At first glance it appears to me ksh is right and zsh and bash are wrong. In as much as word splitting occurs in ksh after the var expansion as I would expect. The bash man page "Parameter Expansion" and "Word Splitting" sections are silent about the behavior when the alternative word is quoted. It seems to me the portable solution is, for example, "${var-}" rather than ${var-""}.

Does POSIX mandate either the ksh or bash behavior? Can we risk breaking existing ksh scripts by changing the behavior?

kusalananda commented 5 years ago

It seems to me that the ksh behaviour is to do quote removal on the result of the parameter expansion before field splitting occurs, which I believe is not what the POSIX standard says should happen.

The POSIX order of things is

The "" that the ${var-""} parameter expansion would result in if var is unset, would be counted as one field if the quotes were still there at the field splitting stage.

Reference: http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap02.html#tag_18_06

krader1961 commented 5 years ago

@kusalananda Thanks for the comment but the expected behavior is still unclear. Which is not surprising since the POSIX spec was based on existing implementations at the time it was written. The question is whether a word that occurs inside ${...} should be treated literally or not. The answer seems obvious at first: Of course the string should be treated literally and chars like double-quotes retained. The problem is that the spec allows such things as nested command substitutions.

Consider what happens with echo ${var:-$(echo " wtf ")} if IFS is its default value or an empty string. So the answer is obviously context dependent. We can't simply retain every char in word verbatim since we have to allow for things like embedded command substitutions which may themselves contain double-quoted strings.

For more fun create a program named f that does nothing more than report its argv values. Then invoke it from ksh and bash via ./f ${var:-(echo " wtf ")}. Both report this (using fish's set --show argv since it was a convenient way to test this):

$argv: not set in local scope
$argv: set in global scope, unexported, with 2 elements
$argv[1]: length=5 value=|(echo|
$argv[2]: length=6 value=| wtf )|
$argv: not set in universal scope

Notice that the embedded double-quotes are elided. Which is surprising given the POSIX rules you linked to.

krader1961 commented 5 years ago

Even if the current ksh behavior is "wrong" by virtue of being different than bash and zsh it is not obvious it violates the POSIX spec. What is worse is that changing the behavior is not backward compatible with previously published versions of ksh. So even if we conclude the current behavior is "wrong" it is not obvious that we can safely change it short of a major release that explicitly does so.

kusalananda commented 5 years ago

I don't know what's happening in your example with ./f ${var:-$(echo " wtf ")} as I don't know what the fish shell is doing or what your f looks like. I'm also not quite sure of what you want to show me with that. POSIX is very specific about what should happen to the word in the expansion:

In each case that a value of word is needed (based on the state of parameter, as described below), word shall be subjected to tilde expansion, parameter expansion, command substitution, and arithmetic expansion. If word is not needed, it shall not be expanded.

Reference: http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap02.html#tag_18_06_02

It does not mention quote removal as that is done later.

After command substitution, ./f ${var:-$(echo " wtf ")} would be ./f ${var:- wtf } (the quotes inside the command substitution are removed as the last step of that command's evaluation). With var unset, this would become ./f wtf, and with the default IFS this would call f with the three character string wtf. Would we have wanted to retain the flanking spaces, we would have used ./ ${var:-"$(echo " wtf ")"} or ./f "${var:-$(echo " wtf ")}".

The issue here (in this GitHub issue) is that the quotes in ${var-""} is treated differently from the quotes in e.g. ${var-"*"} when var is unset (the former expands to nothing, not even an empty string that can be detected, while the latter expands to a literal *). There seems to be a special case for empty quotes here. And this is different from how empty quotes are treated in e.g. set -- "" (sets $1 to an empty string and sets $# to one, not zero).

BTW, while playing around with this I noticed another (related) ksh-specific oddity:

$ unset var; set -- "${var:-"$(echo " wtf ")"}"
$ echo "$#"
3
$ printf 'Arg: >>%s<<\n' "$@"
Arg: >><<
Arg: >>wtf<<
Arg: >><<

This is clearly wrong as the parameter expansion is quoted and should only ever be expanded to a single string. (edit: maybe not so clear, sorry, I need to think about this).

Any other shell would have done this:

$ unset var; set -- "${var:-"$(echo " wtf ")"}"
$ echo "$#"
1
$ printf 'Arg: >>%s<<\n' "$@"
Arg: >> wtf <<
krader1961 commented 5 years ago

@kusalananda wrote:

I don't know what's happening in your example with ./f ${var:-$(echo " wtf ")} as I don't know what the fish shell is doing or what your f looks like.

My apologies. I should have included more context. My ./f script is a single line fish script that simply does set --show argv to display the state of a variable. In this case the content of the CLI args passed to the command. In fish argv is analogous to $@. Which begs the question why isn't typeset -p '@' legal and produce output about that var?

After command substitution, ./f ${var:-$(echo " wtf ")} would be ./f ${var:- wtf } (the quotes inside the command substitution are removed as the last step of that command's evaluation)

This doesn't make sense to me. Since that statement involves a command substitution the quote removal should happen as part of evaluating the result of the command in that substitution. It should not be deferred to the evaluation of the surrounding context.

There seems to be a special case for empty quotes here.

Yes, and that seems to be the source of the confusion. I don't see any reason for that special-case handling of empty quotes inside a parameter expansion other than "it's what people expect to happen". Which is fine but I don't see how the existing documentation (including the POSIX standard) justifies that special-case.

Which gets us back to the question of whether we can safely change the behavior. Note that I'm all for fixing inconsistencies such as this one. But one of things I've heard repeatedly from distro maintainers like RedHat is that any change that might change the behavior of existing scripts without good reason is not acceptable. And so far I fail to see a good reason to change this ksh behavior.

krader1961 commented 5 years ago

This is also another great example for why the SHOPT_BASH build time symbol needs to be killed. The ast.beta branch enables that feature. Yet it behaves no differently from ksh when invoked as "bash" using any of the tests in the original comment. Attempting to be bug-for-bug compatible with bash is a fools errand. See issue #238 (also #611).