Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
27 stars 1 forks source link

New Unicode-related String Interpolation standard -- consider for Ada? #37

Open sttaft opened 1 year ago

sttaft commented 1 year ago

At AdaCore we have been discussing possible ways of supporting string "interpolation" where a special syntax for string literals allows direct "interpolation" of the values of variables and expressions into the string, such as:

"Name = {First_Name} {Last_Name}, Address = {Address}, and Age = {(Now - Birthday) / Year}."

Of course, we would need some way of distinguishing such strings from "normal" string literals, and we have considered various options, such as:

Today I noticed that the Unicode consortium is working on a standard for something that approximates string interpolation, which they call "Message Format 2" (great name ;-):

Message Format 2.0 syntax

which is a follow-on to a relatively old existing standard "ICU MessageFormat", which had some "pain points":

ICU MessageFormat pain points

Here are a couple of simple examples (drawn from Message Format 2.0 syntax):

  {Hello, {$userName}!}

A message with an interpolated $date variable formatted with the :datetime function:

  {Today is {$date :datetime weekday=long}.}

If we want to consider something like this for standardizing, it would make sense to look at the work the Unicode consortium is doing, as it seems to be based on significant experience, both bad and good, with the ICU MessageFormat.

-Tuck

Richard-Wai commented 1 year ago

As much as it makes me cringe to imagine Ada with curly braces, I get the value of such format strings. However to me, it seems the usual way of concatenating strings is very close to what we'd get from any additional complex syntax.

Maybe there is a way we can add a simple syntactic sugar approach that collapses the " & ... & " sequence, such as via the dollar sign. For example, what if we say that a '$' within a string with a matching '$' that is more than zero characters away is exactly equivalent to (using single quote to delineate) '" & ' for the first and ' & "' for the second.

Put_Line ("My name is $First_Name$ and my age is$Year'Image(Current_Year - Birth_Year)$");

This would be syntactically equivalent to

Put_Line ("My name is " & First_Name & " and my age is" & Year'Image(Current_Year - Birth_Year) & "");

And could be formed through simple text replacement.

Also we could follow the same convention for double-quote where two '$' in a row is replaced by a single $.

Put_Line ("Your total is $$$Money'Image(Total)$.");

Which would become

Put_Line ("Your total is $" & Money'Image(Total) & ".");

I'm a bit weary if introducing the complexity as given in the Unicode standard, particularly given that Ada 2022 has such rich user-defined image facilities.

sttaft commented 1 year ago

Now that Ada has a universal 'Image, it makes it very annoying to have to specify it all over the place when producing textual output. So the goal is to replace " & X'Image & " (or worse, " & Integer'Image(X + Y) & "), with simply $X or $(X + Y) when appearing in the middle of an "interpolated" string literal.

ARG-Editor commented 1 year ago

How are you proposing to differentiate "interpolated" string literal from regular ones? It would seem wildly incompatable to do it always (anything that happened to contain the trigger strings would get clobbered, and that would be a behavior change without the possibility of compile-time detection).

                            Randy.

From: S. Tucker Taft @.*** Sent: Wednesday, January 11, 2023 7:04 PM To: Ada-Rapporteur-Group/User-Community-Input Cc: Subscribed Subject: Re: [Ada-Rapporteur-Group/User-Community-Input] New Unicode-related String Interpolation standard -- consider for Ada? (Issue #37)

Now that Ada has a universal 'Image, it makes it very annoying to have to specify it all over the place when producing textual output. So the goal is to replace " & X'Image & " (or worse, " & Integer'Image(X + Y) & "), with simply $X or $(X + Y) when appearing in the middle of an "interpolated" string literal.

- Reply to this email directly, view https://github.com/Ada-Rapporteur-Group/User-Community-Input/issues/37#issu ecomment-1379677213 it on GitHub, or unsubscribe https://github.com/notifications/unsubscribe-auth/AT65YNZWR33JFLB77HADWATWR 5KBZANCNFSM6AAAAAATVUIMTU . You are receiving this because you are subscribed to this thread. https://github.com/notifications/beacon/AT65YN6LRTFYTLASBMC3BXTWR5KBZA5CNFS M6AAAAAATVUIMTWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS SHQ2B2.gif Message ID: @.***>

sttaft commented 1 year ago

How are you proposing to differentiate "interpolated" string literals from regular ones?

The string literal would start (and possibly end) with a unique sequence, such as:

  {" ... "}

or

  F" ... "

or

 $" ... "

as mentioned above in the original note.

So a complete interpolated string literal might be:

Put_Line ({"Name = {First_Name} {Last_Name}, Address = {Address}, and Age = {(Now - Birthday) / Year}."});

or

Put_Line ($"Name = $First_Name $Last_Name, Address = $Address, and Age = $((Now - Birthday) / Year).");

jprosen commented 1 year ago

Le 12/01/2023 à 02:04, S. Tucker Taft a écrit :

Now that Ada has a universal 'Image, it makes it very annoying to have to specify it all over the place when producing textual output. So the goal is to replace " & X'Image & " (or worse, " & Integer'Image(X + Y) & "), with simply $X or $(X + Y) when appearing in the middle of an "interpolated" string literal.

Message ID: @.***>

I see nothing annoying in having to type a few more characters. This looks like another feature justified by ease-of-writing, and because some other interpreted popular language has something like it. This increases the complexity of the language, defeats orthogonality, and I doubt it will have any effect on the popularity of the language... -- J-P. Rosen Adalog 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX https://www.adalog.fr https://www.adacontrol.fr

briot commented 1 year ago

On 2023-01-13 10:29, Jean-Pierre Rosen wrote:

I see nothing annoying in having to type a few more characters. This looks like another feature justified by ease-of-writing, and because some other interpreted popular language has something like it. This increases the complexity of the language, defeats orthogonality, and I doubt it will have any effect on the popularity of the language...

One case where it might be useful, though, is when you have a user-facing application and you want to be able to translate the output for instance. The string to be translated would be something like:     "You have ${count} eggs" Which could be translated to   "Vous avez ${count} oeufs" (likely talking about eggs is pretty rare in Ada applications...)

One other advantage could be performance (though if you are dynamically building strings in a performance-sensitive loop you are likely doing it wrong of course). With the proposed syntax, the compiler could possibly first compute the overall string length, then allocate it once (heap or stack), and finally build it in place.

I am a big fan of python's f-strings (which are similar), though I must admit I have never really missed that feature all that much in Ada. The main place where we have to build strings is for logging, and we have a much more efficient API there that delegates the building of the string (and calling Image) to a background task.

Emmanuel

sttaft commented 1 year ago

I see nothing annoying in having to type a few more characters. This looks like another feature justified by ease-of-writing, and because some other interpreted popular language has something like it. This increases the complexity of the language, defeats orthogonality, and I doubt it will have any effect on the popularity of the language...

It really makes quite a difference on readability, and reduction in silly errors. An intern and I were writing a compiller in ParaSail that generated LLVM intermediate representation (which has a textual form), and at some point we realized all of the calls on ToString and the various concatenation operations were making the code unbelievably hard to read. Since we could, we added string interpolation to ParaSail, and the improvement was enormous. Yes it made it easier to write, but it also made it much easier to read, and hence much easier to notice mistakes.

jprosen commented 1 year ago

Le 13/01/2023 à 14:35, S. Tucker Taft a écrit :

It really makes quite a difference on readability, and reduction in silly errors. An intern and I were writing a compiller in ParaSail that generated LLVM intermediate representation (which has a textual form), and at some point we realized all of the calls on ToString and the various concatenation operations were making the code unbelievably hard to read. Since we could, we added string interpolation to ParaSail, and the improvement was enormous. Yes it made it easier to write, but it also made it much easier to read, and hence much easier to notice mistakes. Fair enough. But couldn't you achieve the same thing with a couple of subprograms? -- J-P. Rosen Adalog 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX https://www.adalog.fr https://www.adacontrol.fr

sttaft commented 1 year ago

Fair enough. But couldn't you achieve the same thing with a couple of subprograms?

I don't see how. String interpolation requires the Ada lever, parser, and semantic analysis to work together. For example, an interpolated string literal like:

$"The solution to your problem is $(F(X + Y/3, "abc") + G('a', 7.5)) presuming X = $X"

I can't quite imagine how a couple of subprograms could handle that. The point is that we are interpolating the 'Image of the value of an arbitrary Ada expression into the middle of a string literal. The equivalent non-interpolated syntax would be:

"The solution to your problem is " & My_Type'Image(F(X + Y/3, "abc") + G('a', 7.5)) & " presuming X = " & X'Image

Both readability and writability are improved in the interpolated version, I would claim.

-Tuck

Richard-Wai commented 1 year ago

I don't see how.

My interpretation of JP's point, which is one I sympathize with, is that the "solution to your problem" should be produced by a function itself. Particularly if that function was nearer in scope, and would see Y, G, et al directly. In such a case you'd simply say:

"The solution to your problem is " & My_Type'Image(Compute_Solution (X)) & " presuming X = " & X'Image

Or even better, you could have another function expression that returned My_Time'Image of Compute_Solution at X,

"The solution to your problem is " & Solution_String (X) & " presuming X = " & X'Image

I have used this kind of approach regularly, and I'm having a hard time seeing this proposal as being anything more than yet another lazy programmer feature, which alienates people (like myself) who really don't want to see Ada go that route, and does nothing to satisfy people using languages that are structurally faster to type, such as Rust.

Sure it might be more readable in isolation, but I don't think you can as easily argue that it is any more readable than abstracting things out to more specialized subprograms.

eggrobin commented 1 year ago

Writing here what I said in the ARG: One should bear in mind that MessageFormat 2.0 (like its ancestor, ICU MessageFormat) is about localized strings; as such, it comes with a rather fancy domain-specific language which is needed to handle the complexities of grammar in localized strings (the main example being pluralization; English is easy here, with just two plural cases—singular for 1, plural for everything else—but many languages are more interesting; consider Russian’s 4 plural cases or Arabic’s 6, depending on the last two digits).

See https://unicode-org.github.io/icu/userguide/format_parse/messages/#complex-argument-types in the old MessageFormat and https://github.com/unicode-org/message-format-wg/blob/main/spec/syntax.md#complex-messages in the draft new one.

String interpolation syntaxes in programming languages usually do not deal with that; as Tucker mentioned in the ARG meeting, it is common to see such things as the following Python:

f"{n} cat{'' if n == 1 else 's'}"

The reason why the MessageFormat syntax exists is that the above construct is impossible to localize: translators do not get to change the program, and no amount of playing with the s and the cat will yield 1 кошка, 2 кошки, 5 кошек, 21 кошка.