sttaft commented 3 years ago

This is the first draft of an RFC for the string interpolation feature.

glacambre commented 3 years ago

Link to the rendered version: https://github.com/sttaft/ada-spark-rfcs/blob/topic/rfc-string-interpolation/considered/rfc-string-interpolation.rst

sttaft commented 3 years ago

Rendered version also here (click on "Files changed" at the top, and then "View File" on the "..." menu on the right): https://github.com/AdaCore/ada-spark-rfcs/blob/7559ac5bd340373b344c4b92d755aadc682f47cb/considered/rfc-string-interpolation.rst

Note that I just fixed a few typos, so this rendered version is somewhat different than what was linked to originally.

Fabien-Chouteau commented 3 years ago

Thanks @sttaft ,

My two cents on this proposal.

1. Referencing variable names

I am not a huge fan of referencing variable names in the template. I think especially with Ada and its most common programming style, names tend to be very long and will result in a template string that is not readable. If I take an example from the original issue:

Put_Line ($"(Name) is a (Profession)"$)

I think it's going to be pretty rare to have such a clean template. In practice you will have names that are part of a record, or something else more complex. So in my opinion we will most like see patterns that will look like this:

Put_Line ($"$(My_Object.Name) is a $(Professions_Img_Array (My_Object.Profession_Index))"$);

Of course there will be the options to declare renames for strings:

declare
   Name : String renames My_Object.Name;
   Profession : String renames Professions_Img_Array (My_Object.Profession_Index
begin
   Put_Line ($"$(Name) is a $(Profession))"$);
end;

But in the end I don't think this brings as much readability improvement as we could get. Of course I would like to have other opinions on this.

I see two other way around, and I am introducing a placeholder "syntax" $" "$ (...) just for the examples:

a. Positional:

Put_Line ($"$(0) is $(1)"'$ --  This is a lot of `$` I must say
         (My_Object.Name, 
          Professions_Img_Array (My_Object.Profession_Index));

b. Another variation on naming (I saw this in Python at least)
```
Put_Line ($"$(Name) is $(Profession)"'$
         (Name       => My_Object.Name, 
          Profession => Professions_Img_Array (My_Object.Profession_Index));
```
2. Expression in templates:
```
Put_Line ($"Plop is $(X + Y)"$);
```
I the same vein as my comment above, I think this is likely to generate templates that will be hard to read. And I am not an implementer, but I am guessing that this will introduce complexity that can be avoided in my opinion.

3. Formatting options

In the RFC $(X+Y, Width => 13) is given as an example. I am personally a fan of printf's %2.3f, %5x, %04d, etc. If we can make something as compact as that it would be great.

Looking in Python's direction, it could be something like:

Put_Line($"Plop is $(0.2f:Angle)"$)

3. Embedded

As most of you know my focus is on embedded and in particular bare-metal and small device. So I am always worried about how a given feature can be implemented and used in this kind of situation. I don't know how compatible with embedded the current proposal is, but I just want to state that it is important to keep that in mind.

sttaft commented 3 years ago

On Wed, Jun 16, 2021 at 5:04 AM Fabien Chouteau @.***> wrote:

Thanks @sttaft https://github.com/sttaft ,

My two cents on this proposal.

Referencing variable names

I am not a huge fan of referencing variable names in the template. I think especially with Ada and its most common programming style, names tend to be very long and will result in a template string that is not readable. If I take an example from the original issue:

Put_Line ($"(Name) is a (Profession)"$)

I think it's going to be pretty rare to have such a clean template. In practice you will have names that are part of a record, or something else more complex. So in my opinion we will most like see patterns that will look like this:

Put_Line ($"$(My_Object.Name) is a $(Professions_Img_Array (My_Object.Profession_Index))"$);

Of course there will be the options to declare renames for strings:

declare Name : String renames My_Object.Name; Profession : String renames Professions_Img_Array (My_Object.Profession_Indexbegin Put_Line ($"$(Name) is a $(Profession))"$);end;

But in the end I don't think this brings as much readability improvement as we could get. Of course I would like to have other opinions on this.

I have been using a language with string interpolation over the past few years, and my experience is that it is much more readable and quite intuitive. This is something where I would suggest we do some prototyping, and then take some existing programs that do a lot of concatenation of literal strings and 'Image (or Put) calls, and see how they look with various approaches to string interpolation.

I see two other way around, and I am introducing a placeholder "syntax" $" "$ (...) just for the examples:

a. Positional:

Put_Line ($"$(0) is $(1)"'$ -- This is a lot of $ I must say (My_Object.Name, Professions_Img_Array (My_Object.Profession_Index));

b. Another variation on naming (I saw this in Python at least)

Put_Line ($"$(Name) is $(Profession)"'$ (Name => My_Object.Name, Profession => Professions_Img_Array (My_Object.Profession_Index));

Using integers as place holders followed by a list of names adds complexity and gets us into the kinds of bugs that are common with printf, where you need to match up parameters with placeholders, and it is easy to have problems during maintenance.

Using named parameters is an interesting alternative, but it feels like an extra step and creates a case where any use of an interpolated string must be in a context where it makes sense to add named parameters. String literals can appear in many contexts in Ada, and trying to figure out how to define exactly where the extra named parameters would go would be a completely new concept in Ada, and potentially add complexity to overload analysis, which is already pretty complex in Ada.

-

Expression in templates:

Put_Line ($"Plop is $(X + Y)"$);

I the same vein as my comment above, I think this is likely to generate templates that will be hard to read. And I am not an implementer, but I am guessing that this will introduce complexity that can be avoided in my opinion.

I don't see the added implementation complexity here, since our general approach is to apply 'Image, and whether it is a name or an expression makes little difference in Ada, particularly since function calls are considered "name"s in Ada, and both can involve overloading. You have to be able to resolve the type without external context, but that is true in other contexts in Ada (e.g. the operand of a type conversion), so this is not an added complexity.

Formatting options

In the RFC $(X+Y, Width => 13) is given as an example. I am personally a fan of printf's %2.3f, %5x, %04d, etc. If we can make something as compact as that it would be great.

Looking in Python's direction, it could be something like:

Put_Line($"Plop is $(0.2f:Angle)"$)

Seems like an interesting approach. It is somewhat redundant to specify the type (with the "f") in this case, and using letters for specifying the radix seems a bit odd given that Ada currently never does that, but some kind of Ada-oriented pattern such as "3.4E2" for specifying the format for a floating-point number, and "16#5#" for specifying a hex integer might be pretty intuitive. "" might be used for cases where the programmer doesn't care, such as "16##" for hex, but no specified number of digits. We probably still need to think about whether we want to pass this to the 'Image function, or to do it all with postprocessing (which is clearly harder for some of these patterns).

Embedded

As most of you know my focus is on embedded and in particular bare-metal and small device. So I am always worried about how a given feature can be implemented and used in this kind of situation. I don't know how compatible with embedded the current proposal is, but I just want to state that it is important to keep that in mind.

This is syntactic sugar, so there is no magic here. It is going to be equivalent to a string concatenation and calls on 'Image, and the programmer will need to keep that in mind.

-Tuck

mgrojo commented 3 years ago

I hope editors start to highlight the interpolated variables different to the characters around, otherwise this would be less readable instead of more.

I suppose escape characters (i.e. "\t\n") will not work in traditional string literals, for compatibility. But then people new to the language will start to complain about being unable to write "\tHello, World!\n". Is the introduction of this C-ism really needed?, wouldn't using ASCII entities be more respective with current Ada: use ASCII; $"$(HT)Hello, World!$(LF)"$?

This is syntactic sugar, so there is no magic here. It is going to be equivalent to a string concatenation and calls on 'Image, and the programmer will need to keep that in mind.

Does that mean that interpolated positive integers will have a leading space? I would understand it for the shake of consistency, but that would shock many.

sttaft commented 3 years ago

On Sat, Jun 19, 2021 at 8:16 AM Manuel @.***> wrote:

...

This is syntactic sugar, so there is no magic here. It is going to be equivalent to a string concatenation and calls on 'Image, and the programmer will need to keep that in mind.

Does that mean that interpolated positive integers will have a leading space? I would understand it for the shake of consistency, but that would shock many.

No, there would be a "Trim" applied to the result of 'Image so no extra spaces. The RFC should have made that clear.

-Tuck

sttaft commented 3 years ago

Thanks. I fixed it in both places, and added a mention of trimming leading and trailing white space.

-Tuck

On Sat, Jun 19, 2021 at 12:04 PM Manuel @.***> wrote:

@.**** commented on this pull request.

In considered/rfc-string-interpolation.rst https://github.com/AdaCore/ada-spark-rfcs/pull/77#discussion_r654812364:

+the beginning of all of them. + +Drawbacks +========= + +Hopefully the semantics will be fairly intuitive, but this is certainly +adding complexity to string literals, and + +Prior art +========= + +String interpolation has begun to show up in many languages. Python has a number +of string literal syntaxes, chosen by a prefix letter, but our sense is that +the string interpolation syntax has emerged as the favorite. We do not want +to have lots of different syntaxes, so we have included the escape mechanism +as part of both of the new string literal syntaxes. We have chosen '\' as the

This is rendered as ''. It probably needs to be quoted using backticks like this: \

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdaCore/ada-spark-rfcs/pull/77#pullrequestreview-687834487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANZ4FMOORPOOQJ75F5TMK3TTS5ZFANCNFSM4545OUPA .

raph-amiard commented 3 years ago

Overall, looking really good!
‘Image doesn’t help when you want to interpolate say an unbounded string into a string, because it’ll typically have quotes around it, but you don’t want that. How do we accommodate this common use case?
One drawback about multiline strings is that they’ll be the first multi-line token in the Ada language. Single line tokens only makes it very easy to tokenize/highlight Ada so far.
Not a fan of the terminating $

Fabien-Chouteau commented 3 years ago

@zertovitch I was just thinking about this, maybe it would be interesting to prototype this string interpolation in HAC.

zertovitch commented 3 years ago

Why not? A few hints on how to do it: there is an internal type called String_Literals.

For adding the string interpolation feature, look at the lines 163+ (currently) in src/hac_sys-parser-expressions.adb .

You can add a boolean mode switch, say Interpolated activated by the front '$', then parse the literal string in the special mode if Interpolated = True.

procedure Primary (FSys_Prim : Symset; X : out Exact_Typ) is -- RM 4.4 (7) F : Opcode; Ident_Index : Integer; begin X := Type_Undefined; Test (CD, Primary_Begin_Symbol + StrCon, FSys_Prim, err_primary_unexpected_symbol); case CD.Sy is >>> when StrCon => X.TYP := String_Literals; Emit_1 (CD, k_Push_Discrete_Literal, Operand_2_Type (CD.SLeng)); -- String Literal Length Emit_1 (CD, k_Push_Discrete_Literal, Operand_2_Type (CD.INum)); -- Index To String IdTab InSymbol (CD); when IDent =>

dsibai commented 2 years ago

As discussed with Yannick/Tucker, the 'double $' syntax is extremely difficult to read.

I'd rather either have a letter indicating a formatting string such as F : F"This is $X and $(X+Y)"

or a whole word : Format"This is $X and $(X+Y)"

The main problem with the word approach is that it looks like a mistyped function call, so a special, compiler-supported function such as Format("This is $X and $(X+Y)") might be more readable.

I'll also add that bash/python/rust/etc reflexes make we want to write: F"This is ${X} and ${X+Y}" or Format("This is ${X} and ${X+Y}")

yannickmoy commented 2 years ago

As discussed with Yannick/Tucker, the 'double $' syntax is extremely difficult to read.

agreed

I'd rather either have a letter indicating a formatting string such as F : F"This is $X and $(X+Y)"

let's see what others think?

or a whole word : Format"This is $X and $(X+Y)"

I also like that, as it is clearer that this starts a format string.

The main problem with the word approach is that it looks like a mistyped function call, so a special, compiler-supported function such as Format("This is $X and $(X+Y)") might be more readable.

That would conflict with a user function called Format taking a string and returning a string, so I don't think we can do this.

I'll also add that bash/python/rust/etc reflexes make we want to write: F"This is ${X} and ${X+Y}" or Format("This is ${X} and ${X+Y}")

I'd rather avoid useless curly braces, unless there is both an advantage in readability and uniformity.

glacambre commented 2 years ago

I'd rather either have a letter indicating a formatting string such as F : F"This is $X and $(X+Y)"

I'm not a fan of format specifiers. What do you think of using backticks? e.g.

Put_Line (`X = $(X)`);

This way interpolated strings are as light as regular strings in terms of syntax but still distinguishable from regular strings.

raph-amiard commented 2 years ago

I'm not a fan of backticks because they introduce a new token in the lexer, and probably new escaping rules in string. I prefer prefixed strings simply because they're simpler to implement, and also pretty familiar to people because of Python. They're also a more generic syntax extension, that you might be able to reuse if you need another type of literals someday.

kevlar700 commented 2 years ago

I can't say that I have analysed this discussion in great detail but I shall throw in a couple of cents from my experience with Go and Dart.

Dart has string interpolation and Go does not.

Backticks allow multi line strings in Go, perhaps a multiline feature, could allow the nicer interpolation that Dart has with longer names?

Go has a Printf but it's generally used Print is type aware and so you can do this but I often miss spaces out.

fmt.Print("Hello", worldVar, "oops worldVar has no leading space")

To be clear, Gos multiline strings are not compatible with formatting but used for constants etc..

However you can do multiline strings with formatting, so long as the last character of the line is + (& for Ada). I switched it out for my SQL statements for a string builder though anyway. Which strangely, seems nicer as well as being more efficient (when pre-bufferred).

Ada has terminating semi colons that might help?

dsibai commented 2 years ago

I found this interesting comment by Claire Dross: https://github.com/AdaCore/ada-spark-rfcs/issues/26#issuecomment-519007316 She suggests having an attribute 'Format.

So we could have an interpolation function as an attribute: String'Format("$(X+Y) = $(X) + $(Y)")

We'd have a readable word
It would look like a function and wouldn't look mistyped like Format"..."
More importantly, the function would be allowed to be magical, since it's an attribute
There couldn't be any confusion with a regular Format(String)-> String function.

mhatzl commented 2 years ago

So we could have an interpolation function as an attribute: String'Format("$(X+Y) = $(X) + $(Y)")

I like this syntax mentioned by @dsibai

As an addition, the formatting options mentioned by @sttaft could optionally be used like String'Format("$(X+Y, "3.4E2") = $(X) + $(Y)")

Seems like an interesting approach. It is somewhat redundant to specify the type (with the "f") in this case, and using letters for specifying the radix seems a bit odd given that Ada currently never does that, but some kind of Ada-oriented pattern such as "3.4E2" for specifying the format for a floating-point number, and "16#5#" for specifying a hex integer might be pretty intuitive. "" might be used for cases where the programmer doesn't care, such as "16##" for hex, but no specified number of digits. We probably still need to think about whether we want to pass this to the 'Image function, or to do it all with postprocessing (which is clearly harder for some of these patterns).

Having the option to set formatting options by passing a variable instead of a literal might also be useful. Like String'Format("$(X+Y, myFormat) = $(X) + $(Y)"). However, I am not sure how wrong formatting options should be handled in this case.

pyjarrett commented 2 years ago

A major strength of Ada is the ability to intuit what code is doing due to minimum symbology and that many of the things you come across can imminently be thrown into a search engine with good results. Adding a dollar-sign introduces additional symbology while reducing readability, and complicating searching for this syntax. From one of these, I also can't tell the type of the formatted string. In this way, I don't think it agrees with the Ada principles of readability or strong typing.

Both Rust and C++ allow a format function, with contained elements in braces, followed by a variable number of parameters in the call. Indexing parameters by integers tends to be very error prone.

In usual fashion, I would want to describe exactly what I'm doing, with double braces being a literal brace:

 Put_Line (String'Format ("{X + Y} = {Y} + {Y}"));

For a multi-line string, it's formatting multiple lines, it would be nice to just have String'Format_Multiline, but there probably does need to be a new type of string literal to cross lines. Python uses triple quotes. The $" ... " syntax might be useful here.

Format specifiers are convenient and terse, but can be painful to remember to do correctly and unintuitive. Combining this with named keys like in Rust or python f-strings makes this read like what it does without any googling:

-- Using "when" to indicate named parameters with attached aspect-like
Put_Line (String'Format("{Sum} = {X} + {Y}",
    when Sum => X + Y with Justify => Right, Fill => '0', Width => 5, Precision => 3,
    when X => X with Justify => Right, Fill => ' ', Width => 4,
    when Y => Y with Justify => Left,
    when others => <> -- use interpolation for anything left over, only needed if any renames exist
   ));

-- Renames to indicate named parameters with attached aspect-like
-- Parameters with aggregate-like to remove ambiguities due to commas
Put_Line (String'Format("{Sum} = {X} + {Y}",
    renames
    Sum => X + Y with (Justify => Right, Fill => '0', Width => 5, Precision => 3),
    X => X with (Justify => Right, Fill => ' ', Width => 4),
    Y => Y with (Justify => Left),
    others => <> -- use interpolation for anything left over, only needed if any renames exist
   ));

Obviously neither is a final syntax, just an idea.

godunko commented 2 years ago

It is expected that "string" is a literal, and had one of standard string types which is not a case for i18n applications. In such case string literal need to be translated into another native language and it is not reasonable to use predefined string (array) types here (it is at least not safe in terms of stack overflow).

Another unresolved thing is use of some kind of formatter, which is able to use locale settings to generate output expected by the user (expected 'dot' character in numbers, thousands separator, measure units, etc.).

yannickmoy commented 2 years ago

It is expected that "string" is a literal, and had one of standard string types which is not a case for i18n applications. In such case string literal need to be translated into another native language and it is not reasonable to use predefined string (array) types here (it is at least not safe in terms of stack overflow).

Then what are you proposing for that case?

Another unresolved thing is use of some kind of formatter, which is able to use locale settings to generate output expected by the user (expected 'dot' character in numbers, thousands separator, measure units, etc.).

This is related to localization, for which I think we need a library approach rather than this simpler format-string feature discussed on this PR.

onox commented 2 years ago

This is related to localization, for which I think we need a library approach rather than this simpler format-string feature discussed on this PR.

I agree that a library is more appropriate and flexible to deal with translations and language-related formatting of numbers, plural rules for numbers, currencies, units, date/times, relative times, etc.

reznikmm commented 2 years ago

A few comments:

Issue: meaning of \n is unclear What character(s) is new line represented by? ARM allows a lot of characters as new line separator:
- 2.1 (16/3) The character pair CARRIAGE RETURN/LINE FEED (code points 16#0D# 16#0A#) signifies a single end of line (see 2.2); every other occurrence of a format_effector other than the character whose code point position is 16#09# (CHARACTER TABULATION) also signifies a single end of line.
- 2.1 (13/3) format_effector The characters whose code points are 16#09# (CHARACTER TABULATION), 16#0A# (LINE FEED), 16#0B# (LINE TABULATION), 16#0C# (FORM FEED), 16#0D# (CARRIAGE RETURN), 16#85# (NEXT LINE), and the characters in categories separator_line and separator_paragraph.
If we hard-code \n as LF it will make no sense on Windows (and, perhaps, on other platforms). If we hard-code it as CR+LF it will make no sense on Unix (and, perhaps, on other platforms). There is no new line character common to all platforms. Let this meaning to be implementation defined is even worse, because every usage of \n makes the program non-portable. Solution: Exclude \n from this RFC at all.
- Issue: New line representation in the multi-line string literal. The same as for \n. Solution: Introduce a new type (or interface?) String_Vector with a Join method. The Join should accept desired separator as a parameter and returns a concatenation of all elements with the separator included between them. Let the multi-line string literal return String_Vector. Each element in the vector is a separate line with no new line character at the end. See for Virtual_String_Vector as an example.
```
function Join_Lines
(Self           : Virtual_String_Vector'Class;
Terminator     : VSS.Strings.Line_Terminator;
Terminate_Last : Boolean := True)
return VSS.Strings.Virtual_String;
--  Join all string vector's strings with each element separated by given
--  Terminator. When Terminate_Last is True line terminator is added after
--  last line.
```
- Issue: Why \" if we already have ""? This is just an extra complexity in the language.
- (Minor issue): Use $$ instead of \$ as more inline with "" escape sequence. If we drop \$, \" and \n then we don't need \ escape character any more. This way Ada source code generation becomes simpler, because you need to escape only two characters ($, ") instead of three.

Fabien-Chouteau commented 2 years ago

* Let the multi-line string literal return String_Vector. Each element in the vector is a separate line with no _new line_ character at the end.

This is interesting. What would the String_Vector type look like? If it's based on the Ada.Containers it's a no-go for me.

An issue I see with that approach is that you will use extra memory and CPU to build the final string.

godunko commented 2 years ago

Then what are you proposing for that case?

Interesting solution is general "syntax sugar" for replacable library implementation. For example, user defined string types may provide own definition for 'Format attribute if support of interpolation is provided.

yannickmoy commented 2 years ago

After discussion at AdaCore, we've come up to the following proposal: (1) Use syntax F".." for format strings. (2) Use braces {Expr} for insertions. (3) Do not allow extra parameters or multi-line in this initial design.

The syntax String'Format(...) was also preferred by some of us, but the need to recognize this construct at the syntax level pushed for solution (1).

We also agreed that we need to fix the problem that, with the definition of 'Image on scalar types in Ada, a value of any non-negative integer type will be printed with an extra whitespace in front of the number (initially designed this way in Ada 83 so that positive and negative numbers of the same magnitude take the same amount of space, as the '-' character for negative numbers becomes a blank character for non-negative numbers).

We had some disagreements on how to solve that problem:

remove the extra blank character just for non-negative values of integer types
trim the result of 'Image on both sides always
trim by default, but offer the option to not trim

@sttaft will update the RFC with this proposal, and lay out the options for the whitespace problem.

pyjarrett commented 2 years ago

Very nice! I should have mentioned it before, but I'm concerned with the non-standard usage of \ escaping in strings, since it doesn't match the rest of the language and may lead to confusion or bugs. It feels out of place for Ada. Is it necessary when something like F"{Latin_1.CR}" would be legal and use {{ and }} for literal braces like Python already does?

yannickmoy commented 2 years ago

Yes, the only escaped characters would be {{ and }} like in Python/C++/Rust, no need for using \.

yannickmoy commented 2 years ago

In a recent discussion, Tuck also proposes to use {"..."} as syntax for format strings instead of F"...", and many at AdaCore (not me) prefer indeed that syntax.

pyjarrett commented 2 years ago

I don't understand where {"..."} comes from as a syntax, I've used quite a few languages and don't ever remember coming across something like that before. I understand they probably want a new token to simplify the language tooling, but it would preclude a in-line dictionary (map) syntax later on. Prefixing strings isn't unheard of for special string types: f-strings in Python, L"..." for multi-byte strings in C++, R"( for raw strings in C++, Rust raw string literals, etc.

onox commented 2 years ago

Does {"..."} provide some safety benefit over F"..." or is it just a preference? I think F"..." is preferred because it is similar to how other languages do it as @pyjarrett said. (New) Ada users who would see the syntax for the first time can probably guess what it does if they have experience with one of these languages.

yannickmoy commented 2 years ago

Does {"..."} provide some safety benefit over F"..." or is it just a preference?

It's just a syntax discussion, no change in semantics.

I think F"..." is preferred because it is similar to how other languages do it as @pyjarrett said. (New) Ada users who would see the syntax for the first time can probably guess what it does if they have experience with one of these languages.

That's also my preference. Others prefer {"..."} because Ada does not otherwise use letters as syntactic elements, but I think that's the same for other languages using these F-strings.

mgrojo commented 2 years ago

The first time I saw those prefixes in other languages, they seemed like odd syntax to me. Now they have a little advantage of familiarity, but forgetting about that, I like the symmetry of {"..."}. Ideally, they should be another pair of quote symbols, but there are no more in ASCII. Some languages use the `grave accents` for some string constructs, but I suppose they have been disregarded because they are difficult to input in some local keyboards, and they are not actually quotes.

yannickmoy commented 2 years ago

I agree with @reznikmm that the current proposal for using \ as escaping character, in particular for newline as \n is not convincing at all. I'd rather use only {{ and }} as additional escaping sequence, like in other languages, in addition to the existing "" sequence in Ada for a double quote " inside a string.

I agree with @pyjarrett that the use of a different syntax {"..."} than all other mainstream programming languages is an issue if we want to reach out to people outside of the existing small Ada user community, especially as multiple of these languages have adopted the same syntax F"...". Do we really want Ada to be an outlier again?

I disagree with the current proposal to trim space characters in front of a digit during replacement, I think this should be reserved for the insertion of a single integer value, so in effect using a special version of the predefined T'Image in that case that does not have this initial space problem, rather than forcing that solution on all T'Image of all types including non-integer ones.

sttaft commented 2 years ago

Yannick Moy wrote:

I disagree with the current proposal to trim space characters in front of a digit during replacement, I think this should be reserved for the insertion of a single integer value, so in effect using a special version of the predefined T'Image ...

It is likely that 'Image for a private type that is used to represent a numeric type (e.g. Ada 2022's big numbers) will also include a space in front of the first digit, to be consistent with Ada's builtin numeric types. So it doesn't make sense to omit the space when interpolating integer'Image and float'image, but not omit the space when interpolating big_integer'image. So it is safer to base the removal of the leading space on the string representation of the 'Image, rather than the type of the value. Furthermore, by default a private type uses the 'Image of its full type, so for any private type that happens to be implemented using a numeric type (e.g. something like GNAT's node-ids), you are breaking privacy if you say the leading space will be stripped if it happens to be implemented by a numeric type, but not otherwise.

sttaft commented 2 years ago

By the way, it seems a bit inconsistent to avoid use of "\" even though it is widely used in other languages, and then say that we should not use {" ... "} because it is not used widely. I believe we should focus on readability and usability for Ada, and make choices that are sensitive to both Ada's tradition (e.g. mirrored syntaxes for bracketing syntax) and to conventions adopted widely (e.g. "\"), with readability being a very high priority.

One problem with doubling is that you always have to know exactly which characters are and are not to be doubled, whereas with "\" you can presumably use it with characters that you are not sure about. So for example, in these new strings, would you have to double "}"? It is not really necessary. And what about single apostrophe (')?

yannickmoy commented 2 years ago

If the only concern is to deal with the standard Ada 2022 big numbers, we can include them in the special case. And we can adapt the RM wording to deal with the privacy issue, I don't see it being an issue. The surprising removal of initial space for arbitrary T'Image looks like a bad idea, which is sure to surprise users.

yannickmoy commented 2 years ago

Regarding doubling, other languages are happy with doubling { and } to escape them, why do differently? Regarding the use of \, precisely it is not used for the newer format strings in these other languages, and @reznikmm did a good job at pointing at all the problems with it.

raph-amiard commented 2 years ago

Regarding the use of \, precisely it is not used for the newer format strings in these other languages, and @reznikmm did a good job at pointing at all the problems with it.

That's incorrect, most languages have escape sequences (Python, Rust, Swift, C/C++, Javascript). In fact it's harder to find a language that doesn't have them. I disagree completely with @reznikmm justification for not adding them. Newlines are hard yes. So let's not let the user solve that problem ...

I agree with @pyjarrett that the use of a different syntax {"..."} than all other mainstream programming languages is an issue if we want to reach out to people outside of the existing small Ada user community, especially as multiple of these languages have adopted the same syntax F"...". Do we really want Ada to be an outlier again?

I agree with this. I'd rather use F"", not because I vastly prefer it to {"..."}, but just because it will be more familiar for people coming from other languages, and statistical familiarity is almost the only objective thing you can base yourself on with syntax.

sttaft commented 2 years ago

My own preference is to strip both leading and trailing whitespace, as it seems the simplest and most uniform rule, allowing the implementor of 'Image to use extra whitespace for standalone situations where some kind of alignment might be appropriate.

I think it might be a bigger surprise for a user if they write:

 {"The input is {X} and the result is {F(X)}."}

and they end up with extra white space before or after the interpolation, when clearly the user did not expect that since they allowed for exactly one space on either side.

One simple way, if you really want to see all of the whitespace that 'Image produces, might be to write:

{"The image of the input is {X'Image} and the image of the result is {F(X)'Image}"}

But normally I would say you are more interested in seeing the value of X or F(X), not its verbatim image which might have additional whitespace to provide some kind of formatting in a standalone usage.

The other option, as we have discussed, would be to allow for a second parameter inside the { ... } to provide control over how the 'Image is transformed during interpolation:

{"The image of the input is '{X, Verbatim}' and the image of the result is '{F(X), Verbatim}'"}

yannickmoy commented 2 years ago

That's incorrect, most languages have escape sequences (Python, Rust, Swift, C/C++, Javascript). In fact it's harder to find a language that doesn't have them. I disagree completely with @reznikmm justification for not adding them. Newlines are hard yes. So let's not let the user solve that problem ...

Thanks for correcting, indeed it's not mentioned explicitly in most description pages I've seen (like in the Rust online doc https://doc.rust-lang.org/std/fmt/index.html) but indeed \n is interpreted as newline. But what's the purpose of allowing a large number of escape sequences, if this is only meant for newline? In particular, why adopt this \{ and \} which are not used elsewhere? (but maybe you'll correct me on that too!)

If newline is to be supported, the RFC should describe how it is handled on various platforms (Unix/Windows/other). Plus if that's only to support this case, we could have a different syntax, e.g. use concatenation between F-strings to indicate newline: F"hello" & F"world" instead of F"hello\nworld".

Fabien-Chouteau commented 2 years ago

My own preference is to strip both leading and trailing whitespace,

I am worried about the run-time penalty for this.

onox commented 2 years ago

e.g. use concatenation between F-strings to indicate newline: F"hello" & F"world" instead of F"hello\nworld".

I think it would be surprising to the user if implicit newlines appear in the output because of concatenation. Also how would you avoid an undesired newline if you need to split some F-strings into multiple parts to avoid long lines (> 79 chars) because of style checks (-gnatyM)?

pyjarrett commented 2 years ago

it seems a bit inconsistent to avoid use of "\" even though it is widely used in other languages, and then say that we should not use {" ... "} because it is not used widely.

My argument against escape characters is based on it breaking conceptual integrity within the language of the behavior within strings, not on popularity. As a C++ programmer who came to Ada recently, this behavior confused me, but it would be confusing for it to work for one type of strings, but not another one. On the other hand, there are languages which differ in this behavior for raw and plain strings, so perhaps it's fine to do anyways?

My argument against {"..."} is due to never recalling this construct in the dozen or so languages I've worked in. If the argument for escape characters is popularity, it seems odd to purposely include a syntax which has no precedent in any mainstream language. I also feel like braces should be reserved for as long as possible, since some of the clunkiness of Ada syntax I've dealt with reminds me of Objective-C before they added the @[a, b, ...] and @{ ...} syntax for NSArray and NSDictionary literals respectively. The only languages I can think of which might even have a valid {" "} syntax would be as a implicit function return in Rust fn foo() -> 'static &str { "foo" } or as a braced initializer in C++: StringType foo { "foo" };.

There's merit to wanting a balanced delimeter, but " itself isn't a balanced element, in the manner of () or {} or <>. With all of these considerations from the previous discussion, the original recommendation of $"..." might actually best since it's a non-lettered prefix formatted string, albeit with another non-standard syntax, as it preserves usage of braces for the future.

yannickmoy commented 2 years ago

Hi Tuck, currently it reads "trimmed of a leading space if the second character is a digit", I assume that's just a leftover from the previous version?

Also, you allowed the use of "" for double-quote character in format strings, and we discussed that it was also a leftover.

yannickmoy commented 2 years ago

@sttaft you mention in the first paragraph the use of backslash to enter unicode characters, is that intentional? It's not described in the rest of the RFC, and we did not discuss it.

yannickmoy commented 2 years ago

You also mention \n as the character used for newline in multi-line strings, but shouldn't it depend on the platform, like the characters inserted by New_Line, so that it would be \r\n on Windows?

yannickmoy commented 2 years ago

small fix: the rule for interpolated_string_literal should have double-quotes at start and end, not only curly braces.

yannickmoy commented 2 years ago

You say: "An escaped_character represents the given graphic_character" which seems to indicate that \k would be interpreted as just the character k. Shouldn't we ask for an error to be issued in such a case, and have explicitly a rule that \\ and \" denote respectively the backslash and double quote?

AdaCore / ada-spark-rfcs

[RFC] String Interpolation #77

1. Referencing variable names

a. Positional:

b. Another variation on naming (I saw this in Python at least)

2. Expression in templates:

3. Formatting options

3. Embedded

@.**** commented on this pull request.