Additional format specification languages

awvwgk commented 3 years ago

Formatting and pretty printing output is a usual task in any programming language. The Fortran format specifiers are a bit special because they are quite unique compared with other programming languages with C-like or Python-like format specifier. While Fortran's format specifiers are probably older than the C-like or Python-like format specifiers the later found wide adoption across several programming languages.

This raises the question if stdlib could offer a stdlib_format module to allow formatting of strings using C-like and/or Python-like format specifiers.

Original post by @ivan-pi in https://github.com/fortran-lang/stdlib/issues/337#issuecomment-798117247:

Should the user formatting re-use the Fortran formatting conventions or do we want to adopt something like the Format Specification Mini-Language in Python?

Personally I would be in favor of having a function called format (like the Python .format() or the C++ 20 std::format) for formatted string conversion, but I'm not sure we can really get there in standard Fortran. Probably we would need to limit the number of arguments and use the class(*) approach like in M_msg.

epagone commented 3 years ago

I'm not sure about this. I think that format specification in Fortran is different but works quite well. What I felt Fortran really needed in the past was the g0 descriptor that (luckily) we now have from a few years.

I might be missing something, though. Is there a "feature" advantage in adopting C or Python like format specifiers (i.e. something that Fortran cannot do or does in an inconvenient way)?

milancurcic commented 3 years ago

Related: #19

I like and support this idea (whether C- or Python-style, or both). I agree that Fortran formatting works well, but alternative format specifications could be helpful to newcomers who are familiar with some other language.

ivan-pi commented 3 years ago

I think the easiest way to enjoy the best of both worlds is to have a converter function, e.g. given the following C format string

"Color %s, Number %d, Float %4.2f"

it should produce the Fortran equivalent

"('Color ',A,', Number ',I0,', Float ',F4.2)"

This way you could easily inline it:

character(len=5) :: str = "Red"
integer :: i = 3
real :: a = 42.0
write(*,cfmt("Color %s, Number %d, Float %4.2f")) str, i, a

Addendum: I imagine regex would be the way to do this by searching for patterns of % and the format letters, and then replacing them with Fortran specifiers. Care needs to be taken of escape characters.

epagone commented 3 years ago

Thinking more about it, probably there is also the advantage that C/C++/Python format specification is more compact and avoids the (potentially annoying) typical combination of single and double quotes. +1 also for me.

ivan-pi commented 3 years ago

I see now that my previous idea to directly mimic the printf (C) or format (C++) is not well-suited to Fortran due to absence of variadic functions.

Instead we just need these format specifier "adaptors", e.g. the C++ code:

 std::cout << std::format("Hello {}!\n", "world");

would be in Fortran

use iso_fortran_env, only: stdout => output_unit
write(stdout, format("Hello {}!\n")) "world"       ! obviously, this is a contrived example

One down-side is that since this would not be a built-in function, syntax errors in the format specifier or the number of arguments can't be caught at compile time. Instead, the program terminate at runtime with a potentially cryptic error message. To avoid this the format function would need to validate it's input first, and terminate with a helpful message, making it an impure function. Still there would be no way to protect against mismatch in the number of arguments. One could perhaps overcome these issues by making format a preprocessor macro, but this would again make it a non-portable solution.

I'd still argue that having a C-style format function would be welcome and make Fortran I/O easier in some situations. It would however be mostly the responsibility of the caller, to get the C formatting string right. For a programmer familiar with C or Python format specifiers, a few edit-compile-run cycles might be easier than learning the Fortran format specificiations.

@certik, do you think this is worth prototyping in LFortran at some point? What I mean is the proposed format function would still be a (non-standard) stdlib thing, but LFortran would offer compile-time format syntax checking. This would imply the compiler gives stdlib some type of elevated status.

arjenmarkus commented 3 years ago

Fortran may not support variadic routines, but you can sort-of work around that, see for instance https://sourceforge.net/p/flibs/svncode/HEAD/tree/trunk/tests/strings/test_keyvars.f90.

That said, one of the major drawbacks IMHO of C-style formats is the impossibility to group them and to have repetition:

write(*, '(10(a,i5)' ) ( string(i), value(i) , i=1,100)

for instance would have to be done using an explicit do-loop in C and some logic to add a newline at the right moment. Well, just my pet peeve :).

Op di 16 mrt. 2021 om 10:43 schreef Ivan Pribec @.***>:

I see now that my previous idea to directly mimic the printf (C) or format (C++) is not well-suited to Fortran due to absence of variadic functions.

Instead we just need these format specifier "adaptors", e.g. the C++ code:

std::cout << std::format("Hello {}!\n", "world");

would be in Fortran

use iso_fortran_env, only: stdout => output_unit write(stdout, format("Hello {}!\n")) "world" ! obviously, this is a contrived example

One down-side is that since this would not be a built-in function, syntax errors in the format specifier or the number of arguments can't be caught at compile time. Instead, the program terminate at runtime with a potentially cryptic error message. To avoid this the format function would need to validate it's input first, and terminate with a helpful message, making it an impure function. Still there would be no way to protect against mismatch in the number of arguments. One could perhaps overcome these issues by making format a preprocessor macro, but this would again make it a non-portable solution.

I'd still argue that having a C-style format function would be welcome and make Fortran I/O easier in some situations. It would however be mostly the responsibility of the caller, to get the C formatting string right. For a programmer familiar with C or Python format specifiers, a few edit-compile-run cycles might be easier than learning the Fortran format specificiations.

@certik https://github.com/certik, do you think this is worth prototyping in LFortran at some point? What I mean is the proposed format function would still be a (non-standard) stdlib thing, but LFortran would offer compile-time format syntax checking. This would imply the compiler gives stdlib some type of elevated status.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/340#issuecomment-800108454, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR6OOTUERPWALO5JGK3TD4R4FANCNFSM4ZDXOZNQ .

ivan-pi commented 3 years ago

That said, one of the major drawbacks IMHO of C-style formats is the impossibility to group them and to have repetition: write(*, '(10(a,i5)' ) ( string(i), value(i) , i=1,100) for instance would have to be done using an explicit do-loop in C and some logic to add a newline at the right moment. Well, just my pet peeve :).

That is a great feature of Fortran indeed. I have been using the "infinite list" specifiers, such as "(*(I0,:,2X))" a lot in my work lately. I see the benefit of C-/Python-like formatting mainly when I want to combine textual output (sentences) with numeric values.

epagone commented 3 years ago

Since this seems to require quite a bit of work regardless, why don't we aim at combining the best of both worlds, i.e. combining the "compactness" of C/Python style and the group repetitions of Fortran?

I have found an excellent comment by @14NGiestas that might be used as a starting point.

One could perhaps overcome these issues by making format a preprocessor macro, but this would again make it a non-portable solution.

If this can be achieved with fypp, I don't see a problem, since stdlib already depends on it.

PS: just realised that @ivan-pi already used even the same expression ("best of both worlds") in a previous comment above, sigh, sorry for the repetition.

arjenmarkus commented 3 years ago

Why not take this one step further:

Instead of converting/substituting values from left to right, there might be format specifiers that put the values in a specific position. This could be handy for intenationalisation. For instance:

write(*, fmt) x, y

could then result in a text "value of x is y" OR "y is the value of x", if the (natural) language requires a different order. IIRC, Java allows that and with the variadic workaround I posted, that might be relatively easy to implement.

Op di 16 mrt. 2021 om 11:47 schreef Emanuele Pagone < @.***>:

Since this seems to require quite a bit of work regardless, why don't we aim at combining the best of both worlds, i.e. combining the "compactness" of C/Python style and the group repetitions of Fortran?

I have found an excellent comment https://github.com/fortran-lang/stdlib/issues/19#issuecomment-688997345 by @14NGiestas https://github.com/14NGiestas that might be used as a starting point.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/340#issuecomment-800152721, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR36BBGFSYVV4U2DBODTD4ZKPANCNFSM4ZDXOZNQ .

ivan-pi commented 3 years ago

One could perhaps overcome these issues by making format a preprocessor macro, but this would again make it a non-portable solution.

If this can be achieved with fypp, I don't see a problem, since stdlib already depends on it.

The preprocessing would have to be done on the calling code. I don't think this is a viable option until fpm evolves to offer some default preprocessor. I think we should explore the format function first.

I found a few more comments in the proposal of @gronki: https://github.com/j3-fortran/fortran_proposals/issues/69, the most relevant being:

Option 2: revive format as "function". The downside is that its quite a long word and would not be very clear when used directly in print/write/read. Example: character(len = *), parameter :: fmt = format("important parameter = ", f6.1, " for n = ", i3)

I guess we will need to adopt a grammar (and use it to automatically create a parser), but it might be good to create a toy implementation first (with limited support of strings, integers, and reals) just to experiment with syntax.

So the Python formatting syntax begins by defining "replacement fields" which are the text portions surround by curly braces {}. The grammar of these is:

replacement_field ::=  "{" [field_name] ["!" conversion] [":" format_spec] "}"
field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
arg_name          ::=  [identifier | digit+]
attribute_name    ::=  identifier
element_index     ::=  digit+ | index_string
index_string      ::=  <any source character except "]"> +
conversion        ::=  "r" | "s" | "a"
format_spec       ::=  <described in the next section>

The field_name and conversionfields allow referencing positional or named arguments, or performing a conversion, e.g. calling the repr(), str(), or ascii() methods prior to output. Some example of how the field names are used are shown below:

"First, thou shalt count to {0}"  # References first positional argument
"Bring me a {}"                   # Implicitly references the first positional argument
"From {} to {}"                   # Same as "From {0} to {1}"
"My quest is {name}"              # References keyword argument 'name'
"Weight in tons {0.weight}"       # 'weight' attribute of first positional arg
"Units destroyed: {players[0]}"   # First element of keyword argument 'players'.

Since the result of the stdlib format function would be detached from the actual write or print standard, I don't think we can support such usage cases. This leaves us with implicitly defined positional variables, and the definition of the format_spec part.

Addendum: positional arguments can be pursued with Arjen's workaround, but I guess this involves a custom write subroutine.

I learn that Intel Fortran actually supports some form of named "variable" interpolation, limited to format strings as one its extensions. With this extension you can do things such as:

write(*,'(3f15.3,<nvari>f9.2)') x,y,z,(var(i),i=1,nvari)

fortran-lang / stdlib

Additional format specification languages #340