AdaCore / ada-spark-rfcs

Platform to submit RFCs for the Ada & SPARK languages
63 stars 28 forks source link

[RFC] Compile time string formatting #26

Open Fabien-Chouteau opened 5 years ago

Fabien-Chouteau commented 5 years ago

So I was thinking about string formatting in Ada and I want to discuss the concept below. I don't know if something similar was already proposed.

The idea would be to add something that is essentially syntactic sugar.

Syntax is discussed at the end.

This:

function Plop (Name, Profession : String) return String
is ("\s is a \s" with (Name, Profession));

Whould be equivalent to this:

function Plop (Name, Profession : String) return String
is (Name & " is a " & Profession);

It would be a compile time feature, so only applicable to string literal. This is not allowed:

function Plop (Fmt, Name, Profession : String) return String
is (Fmt with (Name, Profession));

Bonus 1 : Escape characters

We could add escape characters: \n for ASCII.LF, \t for ASCII.HT, etc.

function Plop (Name, Profession : String) return String
is ("\t\s is a \s\n" with (Name, Profession));

vs

function Plop (Name, Profession : String) return String
is (ASCII.HT & Name & " is a " & Profession & ASCII.LF);

Bonus 2 : Automatic images

\i could mean to automatically apply attribute 'Img:

function Plop (Name, Profession : String; Year : Positive) return String
is ("\t\s is a \s since \i\n" with (Name, Profession, Year));

vs

function Plop (Name, Profession : String; Year : Positive) return String
is (ASCII.HT & Name & " is a " & Profession & " since " & Year'Img & ASCII.LF);

Bonus 3 : Inserts length

We could specify the desired length for the inserted text:

function Plop (Name, Profession : String; Year : Positive) return String
is ("\t\20s is a \s since \i\n" with (Name, Profession, Year));

(I don't know an easy way to do that with current language features)

Syntax

There are two syntax aspects to this feature:

Format operator

This is the relation between the format string "\s is \s" and the strings to be inserted (Name. Profession).

I see multiple ways to do that:

Token

I used \s in the example above because that is the most familiar to me and it also allows for other escape characters \n, \t, etc. But it could be %s a la Python/Java, {0} a la C#/Rust, etc.

clairedross commented 5 years ago

Maybe it could also be a (string) subtype attribute to avoid the quote following quote problem: String'Format ("\s is \s", Name, Profession)

egilhh commented 5 years ago

The positional argument in {0} allows referring to the same parameter several times in the formatting string, and could be extended to allow users to specify the expected types/subtypes:

String'Format("{0 : String} is {1 : String} since {2 : Positive}", Name, Profession, Since);
sttaft commented 5 years ago

In various languages the expressions and the string are interpolated, such as:

Put_Line ("(Name(X)) is a(Profession(X)) and her salary is `(Get_Salary(X))");

-Tuck

On Wed, Aug 7, 2019 at 4:55 AM Claire Dross notifications@github.com wrote:

Maybe it could also be a (string) subtype attribute to avoid the quote following quote problem: String'Format ("\s is \s", Name, Profession)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AdaCore/ada-spark-rfcs/issues/26?email_source=notifications&email_token=AANZ4FIFIU533P63K3TXT33QDKEZXA5CNFSM4IJ56CWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3XWYVA#issuecomment-519007316, or mute the thread https://github.com/notifications/unsubscribe-auth/AANZ4FNLZCYP2KJXGYWF7R3QDKEZXANCNFSM4IJ56CWA .

yannickmoy commented 5 years ago

@sttaft do you have examples for the interpolating choice? to see how it looks like in practice, as I can't think of an example myself. thanks

raph-amiard commented 5 years ago

@Fabien-Chouteau I love the proposal. If it's compile time only though, I wonder if string interpolating wouldn't be even better.

Also, I hate to be that guy (@sttaft ;) ) but I think that escape sequences such as \n in strings deserves its own separate RFC. I'll get on it!

raph-amiard commented 5 years ago

@sttaft I love the interpolating idea. It would be a great fit for Ada indeed. I guess the big downside is that it adds quite a bit of complexity to lexing/parsing. I took the liberty of updating your example because it was badly formatted.

@yannickmoy here is how it could look, alternatively to Tuck's proposal.

function  function Plop (X : Person) return String
is ("${X.Name} is a ${X.Profession}, and her salary is ${X.Salary}");
sttaft commented 5 years ago

Actually, in ParaSail, it was trivial to implement, because there was already a standard operator which combined concatenation and string-ifying (the "|" operator), so supporting string interpolation was as easy as recognizing the `( and converting it to a \" followed by a '|' and then counting parentheses until reaching the matching ')' and converting it to \". With 'Image now proposed to be universal, something analogous might be possible in Ada without a huge effort, though we would have to get rid of that silly space at the front of 'Image for numeric types. ;-0 Note that I chose the "`(" combination as it seemed vanishingly rare in existing programs (and because I liked the "unquote" ~ "backquote" analogy). I worry about "${" as it might be more likely to appear in existing programs, though it would still be quite rare I am sure.

sttaft commented 5 years ago

See Wikipedia article on string interpolation for examples from other languages.

sttaft commented 5 years ago

By the way, it is confusing this thread is labeled as an "RFC" but is in the "open issues" area rather than the "pull requests" area.

kanigsson commented 5 years ago

So what's the problem that we attempt to solve here? What's wrong with Name & " is a " & Profession ?

sttaft commented 5 years ago

If everything is a string to begin with, it doesn't accomplish much. However, if you are more typically writing:

"My age is " & Me.Age'Image & " my birth date is " & Me.Birthdate'Image

& ...

it can get tedious and harder and harder to read. So the notion of string interpolation is that it calls the standard "image" function as part of the semantics of the interpolation construct. For Ada 202x, we have proposed generalizing 'Image so it is available on essentially all types, and user-definable as well. Some kind of string interpolation construct could make construction of output strings much more pleasant and readable, based on my own experience at least. Many languages seem to be moving to supporting string interpolation (see the Wikipedia article). Python is an interesting example, in that they kept adding more and more formatting into strings using various special string markers, but seem to have finally settled on string interpolation as the most useful. Note also that using 'Image as a trailing attribute, rather than as the original T'Image(X) syntax, is not available for arbitrary expressions (like "X + Y") so when the string being interpolated involves some kind of numeric computation, bypassing the explicit T'Image(X+Y) syntax becomes even more attractive.

-Tuck

On Tue, Aug 13, 2019 at 4:42 AM Johannes Kanig notifications@github.com wrote:

So what's the problem that we attempt to solve here? What's wrong with Name & " is a " & Profession ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdaCore/ada-spark-rfcs/issues/26?email_source=notifications&email_token=AANZ4FLZOOQXCPD3HKVC5PTQEJXWDA5CNFSM4IJ56CWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4E7BEI#issuecomment-520745105, or mute the thread https://github.com/notifications/unsubscribe-auth/AANZ4FJIUM4U5GSL2UGZRB3QEJXWDANCNFSM4IJ56CWA .

briot commented 5 years ago

If everything is a string to begin with, it doesn't accomplish much. However, if you are more typically writing:

"My age is " & Me.Age'Image & " my birth date is " & Me.Birthdate'Image & …

1 - efficiency of template expansion

As you noted, the above is indeed less readable because of the use of the Image attribute. Also very important is the lack of efficiency, since everything is built and returned on the secondary stack.

We recently wrote an extensive logging library where instance the “&” operators end up appending to a type similar to Unbounded_String to avoid the use of the secondary stack.

The template strings would presumably be expanded by the compiler into such calls to Append to have maximum efficiency.

2 - asynchronous templates

As part of the logging library I mentioned above, we wanted to support asynchronous logging. So the user task should do as little work as possible, so that logging has little overhead, and then a background task is in charge of formatting the message and writing it to files. We would not be able to use templates in such a context for several reasons:

Log (where_to_load, “${X.Name} is a ${X.Profession}”);

would systematically expand the template, even when the log message is ultimately discarded (so significant performance impact), and more important there is no way to expand the template in a separate task because X might no longer exist.

I believe any proposal with templates should take these two aspects into account, though I do not have an actual proposal yet (presumably the expansion done by the compiler should provide some hooks that users can redefine somehow).

3 - syntax

I like Tuck’s and Raphael’s interpolation idea. I think systematically using the current scope is limiting though (for instance for the async case I mentioned above), so I would introduce a syntax using Fabien’s idea:

 S : String := “${X.Name} is a ${X.Profession}” with (X => X);
 —  explicit value given for X

and

 S : String := “${X.Name} is a ${X.Profession}” with (others => <>);
 —  takes “X” from the scope instead

It would be nice if the template could be a string, as in:

T : String := “${X.Name} is a ${X.Profession}”;
S : String := T with (others => <>);

So the compiler expansion would really have to be on the with itself.

4 - two-layered approach

The way I would implement this is likely via a two layered approach.

Fabien-Chouteau commented 5 years ago

So what's the problem that we attempt to solve here? What's wrong with Name & " is a " & Profession ?

@kanigsson for simple strings it is good enough, but when things get more complicated it becomes unreadable.

That's why I build-up to the last example: ASCII.HT & Name & " is a " & Profession & " since " & Year'Img & ASCII.LF

Fabien-Chouteau commented 5 years ago

The way I would implement this is likely via a two layered approach.

This is probably a no-go for embedded applications because of all the dynamic allocation required for such low-level template library.

Entomy commented 4 years ago

Instead of format strings, why not interpolated strings? Using @Fabien-Chouteau's first example

function Plop (Name, Profession : String) return String
is ("{Name} is a {Profession}");

{ and } are reserved but not used delimiters (they're inside a string anyways so it doesn't really matter) and whatever is inside of them is implicitly understood to need to be converted to a string anyways.

sttaft commented 4 years ago

One challenge with "{" is it is a relatively-widely-used character. An alternative is a character combination such as "`(". That is:

"`(Name) is a `(Profession)"

The "back tick" also suggests the notion of "un-quote" as in Lisp.

-Tuck

On Sun, Mar 29, 2020 at 2:52 PM Patrick Kelly notifications@github.com wrote:

Instead of format strings, why not interpolated strings? Using @Fabien-Chouteau https://github.com/Fabien-Chouteau's first example

function Plop (Name, Profession : String) return Stringis ("{Name} is a {Profession}");

{ and } are reserved but not used delimiters (they're inside a string anyways so it doesn't really matter) and whatever is inside of them is implicitly understood to need to be converted to a string anyways.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdaCore/ada-spark-rfcs/issues/26#issuecomment-605682405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANZ4FN6JP2X72RBJ6OIZOTRJ6KHPANCNFSM4IJ56CWA .

Glacia commented 4 years ago

Back-tick is too easy to miss in my opinion. "${" seems like a standard in other programming languages: https://en.wikipedia.org/wiki/String_interpolation

setton commented 4 years ago

What about backwards compatibility? if we introduce escaping characters inside strings, this changes all the strings in all the Ada programs and libraries that happened to contain these characters before.

sttaft commented 4 years ago

Probably would need to be a new string type and/or a new I/O package. -Tuck

On Sun, Mar 29, 2020 at 4:26 PM Nicolas Setton notifications@github.com wrote:

What about backwards compatibility? if we introduce escaping characters inside strings, this changes all the strings in all the Ada programs and libraries that happened to contain these characters before.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdaCore/ada-spark-rfcs/issues/26#issuecomment-605695666, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANZ4FNEPN7YGDPRTPC4GCDRJ6VITANCNFSM4IJ56CWA .

Entomy commented 4 years ago

$ is typically used as a special interpolated string prefix in other languages. Probably not usable for Ada, @sttaft would have a better idea on that one. But the general idea is you mark the string as specifically an interpolated one, and then only in those strings is this substitution performed. All "classic" strings would still be treated as a raw string literal. This means the only escaping necessary is whatever interpolation marker is chosen, which to keep with Ada conventions on string escaping could just be doubled like

$"{interpolated_variable} {{some raw example}}"

in my case or

$"`(interpolated_variable) ``some raw example"

in tafts.

Entomy commented 4 years ago

Probably would need to be a new string type and/or a new I/O package. -Tuck

Oh please no. The massive convolution of char/string without reasonable thought is a good 95% of why I stopped using Ada entirely and do all my text processing code in another language.

String interpolation can be done at compile time in simple cases, by building up a single new string literal. These simple cases in Ada should be any static Character or String or that has an intrinsic 'Image (not sure if you guys are going forward on the 'Image for everything proposal with Ada 202X).

In non simple cases it's normally implemented in other languages through rewriting to Format(). There's no need for new string types, just a new Format() function which, if this goes forward, I would put in Ada.Strings.

jere-software commented 4 years ago

What about a specialized compile time only form of a qualified expression for a string. In addition to String'(expression) allow String'(Format_String, [Parameter_N]*) Format_String can use whatever rules it wants since this is a different syntax than existing Ada, it can be compile time checked to ensure type safety and that the correct number of params are applied. Then you could do it however you like:

Thing : String := String'("{} and {}", v1,v2);  -- v1 and v2 have a predefined 'Image attribute

OR

Thing : String := String'("\s and \s", v1,v2);  -- v1 and v2 have a predefined 'Image attribute

OR

Thing : String := 
   String'(Format_String => "{} and {}", 
           Parameter_1   => v1, 
           Parameter_2   => v2);  -- If you wanted named association

OR

Thing : String := 
   String'(Format_String => "\s and \s", 
           Parameter_1   => v1, 
           Parameter_2   => v2);  -- If you wanted named association

OR whatever format syntax you want

If no additional parameters are specified, then the qualified expression is parsed as it normally would be.

The \s, {}, or whatever is used would have no special meaning in any other strings so it wouldn't cause backwards compatibility issues. The compiler would just internally parse the Format_String parameter and the additional parameters, verify the number of parameters matches what the format string specifies, verifies the parameters can correctly be inserted into the format string, and then does the operation at compile time. You can either be verbose using named association if that is your style or use positional if you are really concerned about terseness

Glacia commented 4 years ago

@clairedross suggested String'Format, I dont see why it wouldn't work for interpolated strings.

function Plop (Name, Profession : String) return String
is ( String'Format("{Name} is a {Profession}") );
sttaft commented 4 years ago

Ada is somewhat unusual in having multiple string types and user-definable string types, so that should at least be considered as one way to ensure upward compatibility. But you might be right, if the way to handle the incompatibility is simple enough, and tools can help identify existing uses of combinations like "${" or "`(" then just making a wholesale switch to string interpolation might be the right answer. Treating string interpolation as merely syntactic sugar for ..." & (blah)'Image & "... is a nice way to define it, once something like 'Image becomes universal.

-Tuck

On Sun, Mar 29, 2020 at 6:02 PM Patrick Kelly notifications@github.com wrote:

Probably would need to be a new string type and/or a new I/O package. -Tuck

Oh please no. The massive convolution of char/string without reasonable thought is a good 95% of why I stopped using Ada entirely and do all my text processing code in another language.

String interpolation can be done at compile time in simple cases, by building up a single new string literal. These simple cases in Ada should be anything Character or String or that has an intrinsic 'Image (not sure if you guys are going forward on the 'Image for everything proposal with Ada 202X).

In non simple cases it's normally implemented in other languages through rewriting to Format(). There's no need for new string types, just a new Format() function which, if this goes forward, I would put in Ada.Strings.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdaCore/ada-spark-rfcs/issues/26#issuecomment-605708620, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANZ4FL7HESZIATT2RYIN5LRJ7AOJANCNFSM4IJ56CWA .

godunko commented 4 years ago

Few important points of internationalization point of view are missed in this discussion:

  1. Content of format string is subject for replacement at execution time depending of locale settings
  2. Order of substitutions may change on such replacement
  3. Formatting of value may depends from locale settings
onox commented 4 years ago

I don't find format strings with a separate list of strings to be very readable. It's easy to forget a string or put them in the wrong order. So I would prefer something like

function Plop (Name, Profession : String; Year : Positive) return String
is (String'Format ("\t{Name} is a {Profession} since {Year}\n"));

over

function Plop (Name, Profession : String; Year : Positive) return String
is ("\t\s is a \s since \i\n" with (Name, Profession, Year));

It's just a matter of detecting whether it's a regular String or a String needs to be formatted. That should preserve backward compatibility since a regular String will not get formatted. In Python 3.6 you can use f-strings by prefixing the string with an f:

print(f"\t{name} is a {profession} since {year}\n")

I find this much more readable than the old methods of using "format string".format(a, b) or "format string" % (a, b). In ES6, you would use backticks:

console.log(`\t${name} is a ${profession} since ${year}\n`)

In a proper IDE syntax highlighting will clearly indicate which parts are variables or expressions and which parts are strings.

mhatzl commented 3 years ago

Hardcoding \n in a string might not be the best solution as windows for example wants a newline as \r\n. Even if Ada converts this correctly later, it is not as clear as using for example OS.newline. (As you can write on baremetal, OS.newline won't be possible, but a similar way would be nice to have)

As I am really new to ada, I don't know if newline independency is already covered otherwise.

mgrojo commented 3 years ago

I don't find format strings with a separate list of strings to be very readable. It's easy to forget a string or put them in the wrong order. So I would prefer something like

function Plop (Name, Profession : String; Year : Positive) return String
is (String'Format ("\t{Name} is a {Profession} since {Year}\n"));

over

function Plop (Name, Profession : String; Year : Positive) return String
is ("\t\s is a \s since \i\n" with (Name, Profession, Year));

I agree with that, but why add special scape characters? That's would be the first C-ism to enter Ada. I think it would be better to stick to current Ada entities as interpolated strings:

function Plop (Name, Profession : String; Year : Positive) return String
is (String'Format ("{ASCII.HT}{Name} is a {Profession} since {Year}{ASCII.LF}"));

or going for a shorter option, assume there is a use ASCII clause inside interpolated strings:

function Plop (Name, Profession : String; Year : Positive) return String
is (String'Format ("{HT}{Name} is a {Profession} since {Year}{LF}"));

And the way to escape { or } inside String'Format should be to simply double it, just like " inside a string literal.

function Plop (Name, Profession : String; Year : Positive) return String
is (String'Format ("{{""{Name}"" is a {Profession} since {Year}}}"));
sttaft commented 3 years ago

Your suggestions look very nice. The user could presumably do the "use ASCII" should they want to, so it doesn't seem necessary to make it implicit. Also I presume {...} would allow arbitrary expressions inside. -Tuck

On Mon, Dec 7, 2020 at 9:17 AM Manuel notifications@github.com wrote:

I don't find format strings with a separate list of strings to be very readable. It's easy to forget a string or put them in the wrong order. So I would prefer something like

function Plop (Name, Profession : String; Year : Positive) return Stringis (String'Format ("\t{Name} is a {Profession} since {Year}\n"));

over

function Plop (Name, Profession : String; Year : Positive) return Stringis ("\t\s is a \s since \i\n" with (Name, Profession, Year));

I agree with that, but why add special scape characters? That's would be the first C-ism to enter Ada. I think it would be better to stick to current Ada entities as interpolated strings:

function Plop (Name, Profession : String; Year : Positive) return Stringis (String'Format ("{ASCII.HT}{Name} is a {Profession} since {Year}{ASCII.LF}"));

or going for a shorter option, assume there is a use ASCII clause inside interpolated strings:

function Plop (Name, Profession : String; Year : Positive) return Stringis (String'Format ("{HT}{Name} is a {Profession} since {Year}{LF}"));

And the way to escape { or } inside String'Format should be to simply double it, just like " inside a string literal.

function Plop (Name, Profession : String; Year : Positive) return Stringis (String'Format ("{{""{Name}"" is a {Profession} since {Year}}}"));

reznikmm commented 3 years ago

How this could work for a locale-enabled application? When the code

Put_Line ("The cost as of {Date} is {Cost}{Currency}");

It should print for an American

The cost as of January 1 is $19.99

While for a German

Die Kosten für den 1. Januar betragen 19.99€

And for a Russian

Цена на 1 января 19,99₽

mgrojo commented 3 years ago

I guess that wouldn't be possible, since this would be a compile-time feature, so the localization library would receive the processed string.

Entomy commented 3 years ago

@mgrojo .NET does this with localization. It'd be worth looking at how it's done there.

Typically these strings are processed into an equivalent string concat or string Format() call, with interpolation being syntactic sugar, not an outright new feature.