elves / elvish

Powerful scripting language & versatile interactive shell
https://elv.sh/
BSD 2-Clause "Simplified" License
5.71k stars 302 forks source link

Concatenating a string and non-string (sometimes) throws exception #1269

Open krader1961 opened 3 years ago

krader1961 commented 3 years ago

I was working on implementing an "after-command" callback (see issue #1029) and noticed the following, somewhat surprising, behavior. This reproducer isn't, obviously, what I was really doing but is semantically equivalent. In my real code I was concatenating an exception object that was nil in a callback.

> echo "err is "$nil
Exception: cannot concatenate string and nil

Note that a float64 is implicitly converted to a string:

> echo "1.0 is "(float64 1.0)
1.0 is 1

I understand why float64 types need to be special-cased in the context of concatenating strings. My question is whether that special-casing should be replaced with a more general rule that coerces every non-string type to its to-string representation if Elvish knows how to do that conversion. The documentation regarding compounding strings actually implies that should happen:

A compound expression evaluates to a string concatenation of all the constituent expressions

That portion of the documentation makes no mention of non-string types.

krader1961 commented 3 years ago

Requiring an explicit conversion (e.g., "var is "(to-string $var)) for types other than string and float64 might be acceptable but needs to be documented. Personally, I think an implicit to-string when "compounding" a string is friendly and unlikely to cause problems in practice.

hanche commented 3 years ago

FWIW, one can currently concatenate numbers too:

⬥ echo (float64 4)(float64 2)
42

Perhaps one could generalize this to work with arbitrary values, so long as they can be meaningfully converted to strings. But if so, what about lists? It seems reasonable to assume that the concatenation of two lists should be a new list, so that concatenating [a b] and [c d] should result in [a b c d] and not '[a b][c d]'. And what about a string and a list? Should concatenating a with [b c] produce the string 'a[b c]', or the list [ab ac]? I would expect the latter, if anything at all.

I think there are dragons lurking here. Proceed with caution.

krader1961 commented 3 years ago

Perhaps one could generalize this to work with arbitrary values, so long as they can be meaningfully converted to strings.

@hanche, I know you know that all value types can be "meaningfully converted" to a string. That's the point of the repr and to-string commands. See also the documentation of compounding values which explicitly documents that the syntax in question currently expects all values to be strings (whether literal or coerced from a special-cased type).

I'm pretty confident in stating that we don't want compounding two lists to generate a new list. That is, l1 = [1]; l2 = [2]; put $l1$l2 should not output [1 2]. Whether it should continue to throw an exception or coerce the arguments to strings then concatenate the strings is the question.

The core question is whether float64 should be special-cased and, if so, why? I'm okay with retaining the current behavior but it needs to be documented.

I think there are dragons lurking here. Proceed with caution.

Emphatic agreement from me. With regard to both the existing behavior and my proposed change in behavior.

At the end of the day I don't see why this is okay:

> put (float64 1)(float64 2)
▶ 12

But this is not:

> put (float64 1)(float64 2)(put [])
Exception: cannot concatenate string and list
[tty 54], line 1: put (float64 1)(float64 2)(put [])

Note that I'm using an empty list in that example as a placeholder for more fundamental types such as nil and bool which is what caused me to open this issue.

Also, from the compounding valuescompounding values documentation why does this example

> li = [$false $true]
> put {a b}-$li[0 1]

result in Exception: cannot concatenate string and bool rather than output this:

▶ a-$false
▶ a-$true
▶ b-$false
▶ b-$true

I can live with the current behavior. But the special-casing of float64 (and other types?) needs to be documented. On the other hand special-cases such as this one are undesirable in my opinion.

hanche commented 3 years ago

I know you know that all value types can be "meaningfully converted" to a string. That's the point of the repr and to-string commands.

Yes, but one might doubt whether something like <closure 0xc000169d40> is a “meaningful” representation of the underlying object?

I'm pretty confident in stating that we don't want compounding two lists to generate a new list. That is, l1 = [1]; l2 = [2]; put $l1$l2 should not output [1 2]. Whether it should continue to throw an exception or coerce the arguments to strings then concatenate the strings is the question.

It feels wrong, somehow, for this to result in the string '[1][2]', and in some ways, more wrong than[1 2]. When expectations clash in this way, I think an exception is the better outcome.

I think I'd propose the following compromise: Restrict the compounding values mechanism so it works only if the first item is a string (or strings), and raises an exception otherwise. Thus, (float64 4)(float64 2) should fail, but ''(float64 4)(float64 2) should succeed. The idea is that the initial string would signal the user's intention to build a string (or strings) by concatenation. It would also leave the door open for expanding the notion of compounding other kinds of values in interesting ways in the future.

krader1961 commented 3 years ago

The string representation of a lambda is not meaningful, let alone optimal. But the string representation of most (all?) other object types is pretty good; i.e., meaningful.

@hanche, I like the idea of limiting the coercion of non-string types to a string only if a string is present but it shouldn't require a string to be the first value in the compound expression. In other words the following should be equivalent:

''(float64 4)(float64 2)
(float64 4)''(float64 2)
(float64 4)(float64 2)''

Primarily because I think we should be able to write something like $v1':'$v2 where the value of both variables is not a string. We shouldn't have to write ""$v1':'$v2. This also begs the question of whether a literal string is acceptable or a var reference to a string is sufficient. Should this be equivalent to your proposal:

var s = ''
$s(float64 4)(float64 2)
xiaq commented 3 years ago

I'm in favor of implicitly coercing values to strings when there is at least one string involved. So concatenating two (typed) numbers should be disallowed while concatenating a string to anything or anything to string should be allowed.

krader1961 commented 3 years ago

I'm in favor of implicitly coercing values to strings when there is at least one string involved....

Does this rule require a literal string in the compound expression (even if just a bareword) or does it also allow vars whose value is a string? I can see arguments for both cases but favor requiring a literal string. Primarily because of how easy it is to start with a string assigned to a var and end up with a float64 assigned to the same var:

> var f = 3.14
> put $f$f
3.143.14
> set f (math:sqrt $f)
> put $f$f
# exception thrown here
hanche commented 3 years ago

There are certainly advantages to requiring a literal string, one of which is that you can catch violations at compile time. On the other hand, I worry that it might violate PITA. For sure, after var foo = bar, the user may expect to be allowed to replace any occurrence of bar in his code by $foo, right? (Subject only to the obvious syntax requirements.)

I am conflicted on this.

krader1961 commented 3 years ago

I am conflicted on this.

Agreed but I think the only safe approach is to require a literal string in the compound expression. Note that whether or not a literal string is present we can't catch violations at compile time since, like Python, a var can hold any value type. With the obvious exception of namespace (ns:) and lambda (fn~) vars. Without this change to the semantics both of these work today:

> var v = 1
> put $v$v
▶ 11
> set v = (float64 1)
> put $v$v
▶ 11

My proposal is to require writing put ""$v$v (with the literal string allowed anywhere in the expression) if you want to force non-string values to be coerced to strings before being concatenated. The literal string isn't needed if you know that all values are strings. This also, obviously applies to command capture:

> put ""(float64 4)(float64 2)
▶ 42
xiaq commented 3 years ago

There are very few places in Elvish's semantics that violate the substitution principle - as @hanche pointed out, after var foo = bar, it's natural to expect $foo and bar to be equivalent.

The "string becoming number" issue is theoretically a problem, but how often do people intentionally concatenate a number with another number? I struggle to come up with a use case where I have a number and I want to build a string that repeats the number twice.

krader1961 commented 3 years ago

I agree it is extremely unlikely someone would intentionally concatenate two number types and unlikely they would do so unintentionally. I agree that the rule should be to perform an implicit to-string on each operand that is not a string as long as there is at least one string operand present, and that string operand does not need to be a string literal. If a user is worried that none of the vars being compounded is a string they can always force the conversion to strings by including a literal empty string. It's sufficient to note that in the documentation for compound expressions. It's probably also worth adding a sentence to the effect that ""$var and (to-string $var) are equivalent.