golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.79k stars 17.64k forks source link

proposal: Go 2: string interpolation evaluating to string and list of expressions #50554

Closed Cookie04DE closed 1 year ago

Cookie04DE commented 2 years ago

Author background

Related proposals

Proposal

The returned string contains everything inside the quotation marks besides the curly brackets and their contents. For example: For $"Hi, I am %s{name} and %d{age} years old" it is "Hi, I am %s and %d years old", while the slice contains the values of name and age. The slice is never nil but can be empty if no expressions were provided.

Costs

jfesler commented 2 years ago

Would this proposal also cover $` (where ` can easy embed double quotes and use multiline strings), or only $"?

Cookie04DE commented 2 years ago

I didn't initially think about the multiline strings but yes, I think the proposal should cover them too. Although I am not quite sure how you would handle literal curly braces in them, since they don't work with escape sequences like normal double quotes do.

slycrel commented 2 years ago

Thank you for kick-starting this discussion again, and for your proposal!

A few thoughts.

First... I would want to see having %v be an unspecified default. This would essentially allow the assumption of value-based string output, with a possible override if the type needs to be specified. It's also consistent with the existing fmt intent:

For each Printf-like function, there is also a Print function that takes no format and is equivalent to saying %v for every operand.

So in your example:

emailBody := fmt.Sprintf($"Hello %s{name}. The item %s{product} you bookmarked on %s{date} is available now. Consider purchasing now (%v{whatever}) since there are only %d{amount} left")

could be

emailBody := fmt.Sprintf($"Hello %{name}. The item %{product} you bookmarked on %{date} is available now. Consider purchasing now (%{whatever}) since there are only %d{amount} left")

to get the same results.

Second... if this were done as a string and []interface{} combo at the language level via $"", a slight tweak of implementing string and []string* would allow the compiler to optimize/concat that directly... right? If we're going to all the trouble to implement string and []interface{}, it might be worth exploring that a little further and instead create a specific interface that is always resolvable to a string. Maybe that's a tangent, but I think worth bringing up, as this proposal makes string interpolation a second class citizen, with still needing first class language changes.

Third... I'm a little unclear on what this does to { and } within existing strings, and if escaping those only apply to format strings or all strings. This seems like a backwards compatibility issue if you have to escape braces that would not normally be needed for existing strings, especially once you introduce other string variables into the mix.

Fourth... is it going to be an issue to parse out %...{ at the compiler level to allow dropping in code directly? How would that get parsed there that doesn't have all of the inherent need for the fmt semantics at the compiler level? Maybe I am misunderstanding something.

Honestly I'd like to see string interpolation fully at the language level over something sprintf-based. Can you share the reason you see it being better this way? Is this proposal different to simply overcome the resistance of making it a first class feature of the language itself, as mentioned in the referring proposal?

** or an interface array that derives a string is likely better than a []string directly, I think both could work.

ianlancetaylor commented 2 years ago

I would want to see having %v be an unspecified default.

I don't think that works in the context of this proposal. The proposal is specifically not saying anything at all about format specifiers, which is a good thing. A $"" string evaluates to two values: a string and a []interface. The string will have the {} removed. You can choose to pass these values to fmt.Printf if you like, and that will obviously be the most common use case, but it can be used in other ways as well. So a string like $"%{x}" would evaluate to a string of "%" and a []interface{} with the value of x. It wouldn't make sense to pass that to fmt.Printf.

Note that it doesn't work to use a $"" string with fmt.Print, because it won't put the values at the right point in the string. It only works with fmt.Printf and friends.

I'm a little unclear on what this does to { and } within existing strings

It doesn't do anything.

is it going to be an issue to parse out %...{ at the compiler level to allow dropping in code directly

Yes, it absolutely would be an issue, which is why it is good that this proposal doesn't require that.

Honestly I'd like to see string interpolation fully at the language level over something sprintf-based.

It's not simple. See all the discussion at #34174.

ianlancetaylor commented 2 years ago

Because the curly braces may contain any expression, we need to specify the order of evaluation. I suppose the simplest is to say that expressions within curly braces are evaluated as individual operands as described at https://go.dev/ref/spec#Order_of_evaluation.

It's perhaps unfortunate that this doesn't a mechanism to do simple string interpolation, as in $"name: {firstName} {lastName}". Of course we can do that by writing "name: " + firstName + " " + lastName. But if we add this proposal to the language, we're going to be so close to that simple interpolation that I'm concerned that people are going to keep reaching for it and finding that it is not there.

One possibility would be that $"" also produces a third result which is a []int with the byte offsets into the returned string where each {} was found. Then it becomes possible to write a function to do simple interpolation with fmt.Print style formatting. But then simply passing the $"" string to fmt.Printf doesn't work. And overall having to call a special function for string interpolation is a bit awkward though maybe manageable.

ALTree commented 2 years ago

Changing the base language by introducing a new kind of literal that explodes a string in a way that is closely tailored to a specific standard library function (Printf) and makes little sense in any other context certainly feels weird.

It's basically baking a Sprintf-like "macro" in the language that expand a value into something else for the Printf function's convenience. But these $"" would be in the spec, and thus also exist and be allowed everywhere else, even if they don't really make sense outside the context of Printf calls.

IMO a base language feature (especially at a level this low: we're talking about a new kind of literal, and literals are the lowest, most basic "pieces" of a language in the grammar hierarchy) should make sense in every context, and be generally useful, to be worth adding.

Cookie04DE commented 2 years ago

I disagree that it makes little sense in every other context (see the sql statement as an example), but I agree that it is somewhat limited in its usage. Although I think some kind of language feature is necessary to elegantly solve the problem outlined in my proposal.

jimmyfrasche commented 2 years ago

Since the special kind of string really only makes sense if it's used with a function of a specific signature maybe we can go about this differently and have a special kind of function call with a regular string literal.

Rough sketch:

Something like funcName"literal string" where it must be a string literal with no ().

This would be similar to javascript's tagged template literals https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates

There could be functions in fmt that's like Sprint/Sprintln but with the correct signature.

That wouldn't work with the Printf formatting but the idea could be extended to allow passing extra info to the tag function so that something like fmt.Literalf"{x:d}" would be rewritten by the compiler to fmt.Literalf([]string{"", ""}, []any{x}, []string{"d"}) and in this case would have the same output as fmt.Sprintf("%d", x)

Cookie04DE commented 2 years ago

If at all possible I would like to preserve at least this aspect of the proposal: It is a drop in replacement for manually providing the arguments to the Printf family of functions. I think that's appealing because it integrates well into the existing ecosystem and doesn't require adding more functions to the different packages.

jimmyfrasche commented 2 years ago

That's certainly understandable and admirable. My main concern is that it seems like it would be somewhat hard to use correctly since you have to manually pair formatting directives with the interpolation points. Go vet could find them when you're working with fmt but not in general. That's much less of an issue when you can tell the function where the holes are supposed to be by passing a []string instead of a string with the holes already cut out.

Cookie04DE commented 2 years ago

I understand your point, but I don't see how this could be resolved elegantly. We could use %v as the default like suggested by @slycrel so you can't forget to add it, but the problem with that is, that this ties the feature directly or at least indirectly to the fmt package and also complicates the proposal (it would need to detect if a formatting verb is already present and otherwise add the %v). The currently proposed behavior is also in line with the current behavior of Printf(for better or worse), since doing something wrong doesn't result in a compile time error but rather a runtime one.

jimmyfrasche commented 2 years ago

Yeah, you either have to leave a special character where there was a cut or return an []int of the cuts or return a []string instead of a string. In any of those cases you lose the "works with Printf specifically" property but let it work with more stuff in general even though you need to add a SomethingNewf function to fmt to target the original string interpolation use case.

I don't think adding something that only really works with Printf is worth changing the language.

I'm not sure my suggestion is worth it, either, but it would provide something that can be used for string interpolation while still being fairly general (there's no restriction that the tag need return a string, for example).

deanveloper commented 2 years ago

I actually really like this proposal in its current state. Specifically, it means that I don't need to re-learn things like formatting specifiers. The only thing I don't like (as Ian mentioned) is that we're adding something in the spec that's tailored to a library function.

I'm personally not a fan of prefixed strings (ie $"..."), though. To me, they feel like a hack that's brought in that only exists because a language didn't plan for extensibility in the spec. Go already has an extensible langauge feature within strings, which is string escaping with the \ character. Perhaps an alternative syntax could be something like fmt.Sprintf("%d\(var)"), similar to Swift and #34174.

I'd also like to mention that this proposal bears some resemblance to @bradfitz's suggestion, granted I think the language omitting the identifiers makes a lot more sense: https://github.com/golang/go/issues/34174#issuecomment-532416737

fzipp commented 2 years ago

My experience with string interpolation has been that you have to tediously change it back to format strings when you want to support translations into different languages (i18n). So even in programming languages that support string interpolation I developed the habit of writing user-facing strings (this includes error and log messages) as format strings by default.

fzipp commented 2 years ago

Take this sql statement as an example: dbConn.QueryContext(context.Background, $"SELECT name FROM app_user WHERE email = ${email} AND profile_type = ${profileType}").

For SQL statements I'd prefer to use a string templating function that is aware of the syntax within the string, one which applies the correct escaping and quoting depending on context and the data types of the arguments to avoid injection vulnerabilities, like html/template does for HTML+JS.

Cookie04DE commented 2 years ago

@deanveloper I'd be totally fine with replacing the prefixed string with the backslash brace syntax. And I forgot to mention it in the original post, but yes Brad's suggestion was indeed the inspiration for it.

@fzipp The SQL statement is actually injection safe. The $1 before the {email} is a positional parameter for PostgreSQL inside a prepared statement (the equivalent for MySQL would be ?). No quoting or escaping is necessary nor done in this case.

jimmyfrasche commented 2 years ago

I wrote a short example of what a Sprint/Sprintf would look like if the hole position and optional tags are passed in: https://go.dev/play/p/6VBSpkADqmH

@deanveloper The problem with \(v) in undecorated strings is that, without actual string interpolation, the literal evaluates to either a regular string (itself) or to multiple values depending on the contents of the literal. You can't tell at a glance which is which—you have to scan the entire contents of the string or be sufficiently familiar with the context it's used to figure out what kind it is. Unless the result is always a string, you need some way, lexical or syntactic, to distinguish these two very different cases or readability takes a hit.

I prefer the syntactic form I sketched above over the lexical form since you would always need a function to interpret the results so 99.999% of the time you'd end up writing f($"...") anyway so why not just f"..."? I wouldn't mind f($"...") though so I'll leave that as having stated a personal preference. The lexical form does have an advantage if it's allowed to splice in a way multiple-return expressions normally aren't, as in the original proposal, and you can write f(w, $"..."). I'm not sure how I feel about that, though I can definitely see the utility.

@fzipp

My experience with string interpolation has been that you have to tediously change it back to format strings when you want to support translations into different languages (i18n).

The js tagged template literals have been used for i18n: https://i18n-tag.kolmer.net/docs/index.html#currency-formatting Note that it passes formatting directive in the string portion after the interpolation points as there is not a facility for in-band formatting directives in js. The first example from the link

i18n`Hello ${ name }, you have ${ amount }:c in your bank account.`

would be

i18n"Hello { name }, you have { amount : c } in your bank account."

using the version I sketched above.

HALtheWise commented 2 years ago

Thanks for putting this proposal together, overall the broad strokes seem really good. I do wonder whether we're giving up a little too much in service of reusing the existing fmt.*Printf functions.

In particular, because the compiler completely strips all information about where and how the values were embedded in the string...

I think resolving these requires either passing the full unparsed string to the target function or packaging more information (integer offsets, maybe expression strings) into the parsed data slice. Either would probably require making a new family of fmt.*Print*() functions, making the short-term changes more significant, but the long-term readability of the language better. On balance, I personally favor taking that tradeoff.

AndrewHarrisSPU commented 2 years ago

If at all possible I would like to preserve at least this aspect of the proposal: It is a drop in replacement for manually providing the arguments to the Printf family of functions.

A gadget that would do this seems macro-like, to munge around syntax, and I could imagine a lot of different hygiene rules.

The simplest hygiene rule might be, just look up string variables named like fields in scope.

const FruitReport = macro"I have {fruit} fruit: {apples} apples and {oranges} oranges."

// FruitReport compiles to a function, taking current environment values, returning a string, or something...
reportString := FruitReport($) 

It seems like almost immediately, for a little bit of flexibility, one wants some way to call the FruitReport with different arguments, really in a lot of variant ways, entirely sugary, to eliminate syntax overhead. I wonder if pure hygiene rules are reasonable here - strings and native value types are OK, or structs or slices composed thereof ... Anything that might be less hygienic seems like the wrong kind of gadget to me. But maybe something here could be tasteful and useful.

rodcorsi commented 2 years ago

In terms of simplicity, a newcomer could not understand why is possible to use fmt.Printf(".. {foo}") and not possible fmt.Prinln(".. {foo}") or v := ".. {foo}"

IMO tagged template, $"" or f"" calls are more concise, and well known from other languages

runeimp commented 2 years ago

I would be ever so happy if

emailBody := $`Hello {name},
The amount owed is $%.02f{amount}. You have {days} days to pay.
Otherwise the amount of $%.0f{amount} will begin to gain interest
at 184%% per month.
`

was just translated to

emailBody := fmt.Sprintf(`Hello %s,
The amount owed is $%.02f. You have %d days to pay.
Otherwise the amount of $%.0f will begin to gain interest
at 184%% per month.
`, name, amount, days, amount)

before compilation. This simple translation would answer all my string interpolation dreams. No need to specially prepare anything for fmt input. Just a special string that generates the code that would return the string the compiler knows. fmt.Sprintf is usable in all situations that expect a string, no?

Common code to both code blocks

name := "The Dude" // Defaults to string
amount := 13.42007 // Defaults to float64
days := 5          // Defaults to int
Cookie04DE commented 2 years ago

I put together a little prototype: https://github.com/Cookie04DE/gof It does not implement everything but it covers the basics.

ianlancetaylor commented 2 years ago

A problem with this approach is that in some sense each argument must be mentioned twice: once with the format character and once with the name. That applies to any use of this style of interpolation: since the curly braces disappear in the final version of the string, there always to be something in the string, outside of the curly braces, that marks where the argument should go. If you accidentally omit the %v or whatever, there is no way for fmt.Printf or whatever to know where the arguments were in the formatting string.

We could make this even more specific to fmt.Printf by having the compiler actually search for a % expression before the curly brace, and, if missing, replace it with %v.

Or we could make this slightly less specific to fmt.Printf by using %, and say that anything between the % and the " is inserted in place of the curly braces: %v"{name} {address}". Although that is rather cryptic.

Or perhaps we could use a predeclared builtin function.

func interpolate(s string, marker string) (string, []any)
Cookie04DE commented 2 years ago

Perhaps an alternative proposal could be that you have to specify a replacement string for the expressions you insert. Like this: fmt.Printf($"Hello {name, "%s"})". In this case the returned string is Hello %s, while the slice contains the value of the name variable. But format strings can have a default replacement which is used if no replacement is specified: fmt.Printf($("Hello {name}, you are {age, "%d"} years old.", "%v")). Perhaps functions could specify a default value for the default replacement which get's used as the default replacement if none is specified (fmt.Printf could use %v for example): fmt.Printf($"Hello {name}, you are {age} years old"), notice how you can leave out the brackets since you don't need to provide a default replacement (fmt.Printf already does this for you), you can also do that if all your expressions provide a replacement.

ianlancetaylor commented 2 years ago

I don't think a comma would be the best choice, because a Go expression can include a comma. But I think we could use a colon.

fmt.Printf($"Hello {name:%s}")

This would become

fmt.Printf("Hello %s", name)

Then we would give an error for a curly brace in one of these strings without a colon. Or perhaps if there were no colon we could replace it with %v.

@bradfitz suggests that we could use single quote instead of $"", as single quote currently only permits a single character.

@griesemer observes that fmt.Printf takes a string and a variadic list of arguments. So to cleanly call fmt.Printf $"" should evaluate to a list of expressions, as though calling a function that returned that many values. The first would be type string and the rest would be type any. Or perhaps the rest should be type of the expression named in the string, so if count is a value of type int then the result of $"total is {count:%d}" would be "total is %d", count of type string, int.

Cookie04DE commented 2 years ago

A colon would indeed be better suited as a separator. We also need some way for a function to signal its default replacement. fmt.Printf would use %v as its default replacement for example. One way that might work is a comment above the function which specifies that. Although I don't like magical comments. We also should consider that a function might be called with multiple format strings. We could avoid all that by just saying %v is the default for every function, but that might couple this language change too closely to the fmt package which this proposal aims to avoid.

I have two slight concerns using '' for format strings. First: It might be confusing to new people learning Go since format strings and rune literals would share the same starting rune (the ') but are two totally different things. Second: This would remove the ability to create format strings with just one character. 'a' would be considered a rune literal by the compiler not a format string with the string just being "a". Although I don't know how useful this kind of format string could be anyways, so that might just be a slight inconvenience.

Changing the return types of format strings to no longer be (string, []any) would only affect the way variable assignment of format strings works. As far as I can tell that wouldn't be dramatic. I think it would be more useful if the return values had the same type as the input expressions.

gazerro commented 2 years ago

I suggest to use this syntax

'Hello {name:s}'

I find it clearer compared to this one

$"Hello {name:%s}"

We can disallow a single quoted string with no embedded expressions. So 'a' can only be a rune, without ambiguity.

As previously suggested, this expression can be used where an expression list can be used. The compiler expands its embedded expressions. For example 'Hello {name:s}' is expanded into "Hello %s", name.

As a special case, if an embedded expression is convertible to a string, according to the Go specification, the verb can be omitted and the expression is converted to a string, possibly with % converted to %%. An embedded expression without a verb is not expanded. For example if name is convertible to a string, you can write

fmt.Println('Hello {name}')

and the compiler converts it to

fmt.Println("Hello " + string(name))

Embedded expressions with and without a verb can be used in the same single-quoted string. For instance

fmt.Errorf('field {name} is invalid: {err:w}')

is compiled as

fmt.Errorf("field " + string(name) + " is invalid: %w", err)

Since expressions without a verb are not expanded, if name is convertible to a string, you can write

s := 'Hello {name}'

Some examples adapted from this proposal and other proposals:

fmt.Printf("Hello, %s! Today is %v, it's %02v:%02v now.", name, date.Weekday(), date.Hour(), date.Minute())

// can be written:

fmt.Printf('Hello, {name}! Today is {date.Weekday():v}, it's {date.Hour():02v}:{date.Minute():02v} now.')
emailBody := fmt.Sprintf("Hello %s. The item %s you bookmarked on %s is available now. Consider purchasing now (%v) since there are only %d left.", name, product, date, whatever, amount)

// can be written:

emailBody := fmt.Sprintf('Hello {name}. The item {product:s} you bookmarked on {date:s} is available now. Consider purchasing now ({whatever:v}) since there are only {amount:d} left.')
ianlancetaylor commented 2 years ago

Upon further reflection, using a single quote seems to obscure and easy to miss.

Since the main use of this would be for fmt.Printf and friends, instead of $"" we could use %"". The % perhaps suggests the relationship to fmt.Printf. We can't use % with a string, so this would be unambiguous. We could accept %"" and %``, the latter being a raw string literal.

The idea of supporting any string conversion if there is a missing colon might be troublesome, as Go permits converting int to string (although it does't do what most people expect). But we could permit the missing colon if the type of the expression is simply string. And give an error if the type is something else--an error saying that the colon is required.

gazerro commented 2 years ago

@ianlancetaylor

Since the main use of this would be for fmt.Printf and friends, instead of $"" we could use %"". The % perhaps suggests the relationship to fmt.Printf. We can't use % with a string, so this would be unambiguous. We could accept %"" and %``, the latter being a raw string literal.

%"" is much better than $"". I also think that on balance it might be the best solution.

The idea of supporting any string conversion if there is a missing colon might be troublesome, as Go permits converting int to string (although it does't do what most people expect). But we could permit the missing colon if the type of the expression is simply string. And give an error if the type is something else--an error saying that the colon is required.

It seems acceptable to me. It might be allowed to omit the colon even for integers but then I understand that someone might wonder why for strings and integers it can be omitted and for other types not.

gazerro commented 2 years ago

Let's assume for a moment that we can change how the Printf function interprets its arguments, that is, instead of

fmt.Printf("my name is %v and I am %d years old", name, age)

you have to write

fmt.Printf("my name is ", name, "%v and I am ", age, "%d years old")

that is, expressions are interleaved with formatting strings and verbs are placed at the beginning of the string following the expression.

If the verb is not present for an expression, %v is assumed. The previous example can be written as

fmt.Printf("my name is ", name, " and I am ", age, "%d years old")

Let the compiler split the following string according to the expressions in parentheses

%"my name is {name} and I am {age}%d years old"

in this way

"my name is ", name, " and I am ", age, "%d years old"

So, putting the two together, you could write

fmt.Prinftf(%"my name is {name} and I am {age}%d years old")

This solution would have the following advantages

  1. Don't depend on any particular verb. The spec knows nothing about verbs.

  2. If there is no verb the called function can use a default verb. Printf could use %v.

  3. Do not place restrictions on the type of expressions.

  4. It places no restrictions on the content of the string. Only the occurrences of { and } must be escaped as in "\{ {value}%s \}".

  5. Does not require adding new functions and methods to the fmt and sql packages.

If compatibility is guaranteed only for programs that correctly call Printf, that is, for which vet has nothing to report, we can extends the Printf method in this way. Note that besides this there are other ways to extend Printf.

Below is an example that can be used with an extended version of the sql's Query method

db.Query(%"SELECT * FROM products WHERE price < {price} AND category = {category}")

and with an extended Sscanf method

n, err := fmt.Sscanf(%"Kim is 22 years old", "{&name} is {&age} years old")
ianlancetaylor commented 2 years ago

Seems to me that {age:%d} is clearer than {age}%d. It avoids relying on % as special. Go does not permit : in an expression, so {arg:%d} is unambiguous.

gazerro commented 2 years ago

@ianlancetaylor % would be special only for Printf but not for the compiler in the same way as {age:%d}.

Using {age:%d} instead, the compiler could expand the literal string depending on whether the verb is present or not for each expression as follows

"boo {expr} foo" is expanded to "boo ", expr, " foo"

"boo {expr:verb} foo"is expanded to "boo verb foo", expr

For example, the following call

fmt.Printf(%"my name is {name} and I am {age:%d} years old")

is expanded as

fmt.Printf("my name is ", name, " and I am %d years old", age)

name would be formatted by Printf with the %v verb.

fmt could be extended to also allow arguments passed in this way and the same could be done for sql package.

magical commented 2 years ago
fmt.Printf("my name is ", name, " and I am %d years old", age)

I don't think that works? Suppose name is a string - how is fmt.Printf supposed to tell that it should expand verbs in the 3rd argument, " and I am %d years old", but not in the 2nd argument, name?

gazerro commented 2 years ago

@magical Verbs and arguments are first parsed as they are now. If there are too many arguments the first extra argument is formatted with %v and then appended to the result. If there is an additional argument, it must be a string, it is considered a formatting string and the process starts again.

I don't think that works? Suppose name is a string - how is fmt.Printf supposed to tell that it should expand verbs in the 3rd argument, " and I am %d years old", but not in the 2nd argument, name?

In this case, "my name is " does not consume arguments, so name is formatted with %v and appended to "my name is ". Then " and I am %d years old" is considered a formatted string and the process starts again.

ianlancetaylor commented 2 years ago

@gazerro I think I misunderstood. You seem to be suggesting that we compile fmt.Printf specially. We aren't going to do that.

ianlancetaylor commented 1 year ago

Perhaps it would be useful to consider a simpler approach: #57616 .

ianlancetaylor commented 1 year ago

Per the discussion in #57616 this is a likely decline. Leaving open for four weeks for final comments.

You can a similar effect using fmt.Sprint, with custom functions for non-default formatting. So it can already be done in Go, it just looks a bit different. fmt.Sprint("This house is ", measurements(2.5), " tall") where measurements is a function that returns strings like "two feet six inches".

ianlancetaylor commented 1 year ago

No further comments.