JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.69k stars 5.48k forks source link

[Proposal] Python-style format strings in Base #40518

Open bicycle1885 opened 3 years ago

bicycle1885 commented 3 years ago

I would like to propose having Python-style format strings and f-strings in Base. The proposal is to introduce a macro @f_str and a function format into Base so that users can easily access the rich functionality of the format specification mini-language like below:

julia> pi = Float64(π)
3.141592653589793

julia> f"π ≈ {$pi:.6f}"  # fixed-point notation with precision 6
"π ≈ 3.141593"

julia> Fmt.format(f"π ≈ {:.6f}", pi)  # function style (positional argument)
"π ≈ 3.141593"

julia> Fmt.format(f"π ≈ {pi:.6f}", pi = pi)  # function style (keyword argument)
"π ≈ 3.141593"

Now I'm making an implementation of this proposal as a package at Fmt.jl. It is not yet complete, but would be mature enough to test the usability and the performance in practice. The current supported syntax is defined as follows:

# replacement field
field      = '{'[argument]['/'conv][':'spec]'}'
argument   = number | ['$']identifier
number     = digit+
identifier = any valid variable name
digit      = '0' | '1' | '2' | … | '9'
conv       = 's' | 'r'

# format specification
spec       = [[fill]align][sign][altform][zero][width][grouping]['.'precision][type]
fill       = any valid character (except '{' and '}') | '{'[argument]'}'
align      = '<' | '^' | '>'
sign       = '+' | '-' | ' '
altform    = '#'
zero       = '0'
width      = digit+ | '{'[argument]'}'
grouping   = ',' | '_'
precision  = digit+ | '{'[argument]'}'
type       = 'd' | 'X' | 'x' | 'o' | 'B' | 'b' | 'c' | 'p' | 's'
             'F' | 'f' | 'E' | 'e' | 'G' | 'g' | 'A' | 'a' | '%'

https://github.com/bicycle1885/Fmt.jl#syntax

The proposed format string will supersedes a counterpart of Printf.jl. AFAIK, Fmt.jl already supports all features of Printf.jl in a more succinct syntax. Fmt.jl does not require type specifiers; {} is a generic replacement field that accepts any value. For example, f"x is {$x}." is the same as @sprintf "x is %d." x for integer x, but the former accepts any value of x. Fmt.jl also supports dynamic fill, width and precision, although this feature seems under way at #40105. Also, the thousand separator is supported (#29077). At the moment, locale-dependent features are not implemented.

This kind of format syntax is ported to other languages as a part of the standard library. For example, Rust has the std::fmt module and C++20 will have a new formatting library with the same style. The point of this proposal is that the Python-style format syntax is so prevalent (at least it is getting to be) that it deserves to be included in Base.

In terms of performance, Fmt.jl is much faster than other tools. This benchmark demonstrates that string construction with format f"({$x}, {$y})\n" is much faster than "($x, $y)\n" and other formatting methods. Formatting.jl is another attempt to support the Python-style format string. However, its performance is not optimal (~19x slower) and the API is not so simple.

The developers can define their own formatting functions for their types. Two functions, formatinfo and formatfield, are at the core of formatting: formatinfo calculates the required size for memory allocation and formatfield writes the formatted content to a pre-allocated buffer. For example, nothing is currently formatted by these two functions in this way. Since the format syntax itself cannot be extended by the developers, its use case would be limited to some numeric types.

The followings ideas are not implemented yet:

If you like this proposal overall, I'm happy to make a pull request for code review and in-depth discussion. Even if you don't, I'm going to release Fmt.jl as a package. I would like to know whether you think this functionality is worth in Base or in the standard library.

Thank you for your attention.

simeonschaub commented 3 years ago

Since 1.6 there is actually an unexported @format_str string macro with fairly similar functionality to what you are proposing, although not quite as full featured and with slightly different syntax: https://github.com/JuliaLang/julia/blob/f9720dc2ebd6cd9e3086365f281e62506444ef37/stdlib/Printf/src/Printf.jl#L201. Perhaps we could just document this better? More advanced features can always just live in an external package which depends on the Printf stdlib.

bicycle1885 commented 3 years ago

The Printf.jl-style format syntax (I call it the C-style syntax for short) is very different from the Python-style syntax. For example, the following C-style format using Printf.jl:

julia> using Printf: format, @format_str

julia> format(format"x: %8d", x)  # or @sprintf "x: %8d" x
"x:       42"

can be written as:

julia> f"x: {$x:8}"
"x:       42"

if we had the proposed f-strings in Base.

mkitti commented 3 years ago

It would be best to continue to develop this outside of Base. The trend is to move features like this out of Base and the standard libraries, which makes it much easier to update and develop independent of the Julia release cycle.

I encourage you to register the package and perhaps do a series of prereleases before 1.0 as you build this out.

quinnj commented 3 years ago

I mostly agree with @mkitti here, in that the package should definitely be released/developed outside Base for now. But I'm also of the opinion that if the package is super stable, not likely to change much in the future, then it seems like a good candidate to be a stdlib. And I'd appreciate having a more expressive/better formatting string option more readily accessible.

bicycle1885 commented 3 years ago

I'm willing to release it as a third-party package to experiment some features and get feedback from the community. But before doing that, I'm interested in the possibility of incorporating it in Base as a standard tool. That's why I've opened this issue here.

The trend is to move features like this out of Base and the standard libraries, which makes it much easier to update and develop independent of the Julia release cycle.

My understanding is that the trend ended when Julia 0.7 was released. We've added several new functions and modules to Base and the standard library since then. Yes, format strings are big deal, but it can be achieved by exporting a single macro @f_str from Base. I'm not willing to export the format function because I expect it will be less commonly used if @f_str is there. If we reject to add new functions and macros to Base just because it is "the trend", the Julia project will be boring.

Having this feature in Base has advantages. Users can easily access to an advanced formatting syntax out of box. I use Printf.jl very often, but I'm frustrated since I always need to write using Printf before using it. Also, if it were in Base, users would not be bothered by trying to find a better third-party package for formatting. I believe better stuff should be in a better place.

if the package is super stable, not likely to change much in the future, then it seems like a good candidate to be a stdlib

The implementation is not super stable yet; it is still a very young package. But I'd say the formatting syntax is stable; so stable that it is accepted by Python, Rust, and C++ as their standard tool. Therefore, the user-facing API is not likely to change in the near future.