This PR adds a new string formatting system to replace uses of SDL_snprintf and string concatenation.
Making our own string formatting system has been briefly discussed during the review of the localization branch, and on the VVVVVV Discord. It's inspired by Python's format strings, but simpler.
This is primarily to benefit localization - strings will be easier to understand (Now using %s Tileset → Now using {area} Tileset, "%s remain" → "{n_crewmates|wordy} remain"), translators can change the word order for their language's grammar (%1$s is a POSIX extension), and this system is also less error-prone (making the format string not align with the actual arguments won't result in a crash or UB).
It also integrates our needs better - particularly the "wordy" numbers without having to have a help.number_words(n).c_str() at the callsite, translators can opt in and out of wordy numbers per string, and this should also make it easier to solve #859 (see below).
This PR adds the formatting system itself, and changes one SDL_snprintf in the code to use it as a small demo (the rest should probably be done in the localization branch to avoid more unneeded work):
char buffer[64];
- SDL_snprintf(buffer, sizeof(buffer), "Now using %s Tileset", tilesets[tiles]);
+ vformat_buf(buffer, sizeof(buffer), "Now using {area} Tileset", "area:str", tilesets[tiles]);
Usage
There are three main functions:
vformat_cb - gives parts of the output string to a user-supplied callback
vformat_buf - does what SDL_snprintf does - fill a user-supplied buffer
vformat_alloc - a convenience function for when you don't want to hardcode an arbitrary limit (needs to be SDL_freed)
All need to be passed the format string with placeholders to be filled in, an index string describing the arguments, and a number of arguments as described by the index string.
The placeholders in the format string look like {name} or {name|flags}. Flags are separated by |. Some examples of flags are wordy (to turn an integer into a string like "Twenty") and digits=N (which works like %0Nd or %Nd depending on the spaces flag).
The arguments index has a comma-separated list of argument names and their types, like "number:int, total:int, crewmate:str". There's also a but type, for replacing a certain button constant (like BUTTON_INTERACT) to a controller icon in the future.
Full example: (apparently GitHub's syntax highlighting breaks on this, dunno why)
char buffer[100];
vformat_buf(buffer, sizeof(buffer),
"{crewmate} got {number} out of {total} trinkets in {m}:{s|digits=2}.{ms|digits=3}",
"number:int, total:int, crewmate:str, m:int, s:int, ms:int",
2, 20, "Vermilion", 2, 3, 1
);
=> "Vermilion got 2 out of 20 trinkets in 2:03.001"
The system is fully documented at the top of VFormat.h.
Benchmarks
I wanted to know how fast my new system was compared to the [SDL_]snprintf we already use, so I decided to run two technically equivalent examples 10 million times:
// .. ...
int max = 10000000;
clock_t t0 = clock();
for (int ii = 0; ii < max; ii++)
{
char buffer[100];
SDL_snprintf(buffer, sizeof(buffer),
"%s got %d out of %d trinkets in %d:%02d.%03d",
"Vermilion", 2, 20, 2, 3, 1
);
}
double elapsed_time = (double)(clock() - t0) / CLOCKS_PER_SEC;
printf("SDL_snprintf: %fs\n", elapsed_time);
t0 = clock();
for (int ii = 0; ii < max; ii++)
{
char buffer[100];
vformat_buf(buffer, sizeof(buffer),
"{crewmate} got {number} out of {total} trinkets in {m}:{s|digits=2}.{ms|digits=3}",
"number:int, total:int, crewmate:str, m:int, s:int, ms:int",
2, 20, "Vermilion", 2, 3, 1
);
}
double elapsed_time2 = (double)(clock() - t0) / CLOCKS_PER_SEC;
printf("vformat_buf: %fs\n", elapsed_time2);
printf("We're only %fx as slow\n", elapsed_time2/elapsed_time);
While I did use SDL_ functions in line with the rest of the codebase, I'd like to mention I consistently got a somewhat faster result when I used vanilla strlen, memcmp, strchr, malloc, free, memcpy and strtol, but it's not overly dramatic:
(non-SDL_)
SDL_snprintf: 2.178149s
vformat_buf: 12.474551s
We're only 5.727134x as slow
(SDL_)
SDL_snprintf: 2.283219s
vformat_buf: 15.566886s
We're only 6.817956x as slow
Either way, this seems very reasonable - printf-family functions have been around for decades, don't support reorderable arguments (thus only go through the vararg list once), and don't compare names of arguments.
C++20 std::format
After designing and implementing all of this, I discovered that C++20 adds a somewhat similar Python-inspired formatting system. So I wanted to benchmark this as well just for fun. Which required me to benchmark on MSVC 2019+, the only place that seems to support it currently. And then only under /std:c++latest, not /std:c++20.
(max is now only 1 million since this is a slower computer)
t0 = clock();
for (int ii = 0; ii < max; ii++)
{
std::format(
"{2} got {0} out of {1} trinkets in {3}:{4:02}.{5:03}",
2, 20, "Vermilion", 2, 3, 1
);
}
double elapsed_time_sf = (double)(clock() - t0) / CLOCKS_PER_SEC;
printf("std::format: %fs\n", elapsed_time_sf);
snprintf: 4.868000s
std::format: 64.290000s
v_format_buf: 9.640000s
We're only 1.980279x as slow as snprintf and 6.669087x as fast as std::format
So this makes VFormat's numbers look even more reasonable. :)
Controller icons
Since this branch already prepares a bit for implementing controller button icons within text (#859), my general idea is to assign constant values for each generic button purpose (for example, BUTTON_FLIP=0, BUTTON_LEFT=1, BUTTON_RIGHT=2, BUTTON_INTERACT=3, ...), and to reserve some unused Unicode characters to correspond to these values, which the text renderer will display as appropriate button icons for these actions (we could use the Private Use Area starting at U+E000). Note that something like BUTTON_FLIP might be (B) with one controller layout, (❌) with another, and just "ACTION" if you use a keyboard. So the corresponding PUA characters would be kind of dynamically adapting emojis. That way, the text renderer (and Graphics::len) have something easy to work with (just a single codepoint), and the formatting system can then be used to easily get these characters into the text:
vformat_alloc("Press {button} to activate terminal", "button:but", BUTTON_INTERACT);
=> "Press � to activate terminal"
Legal Stuff:
By submitting this pull request, I confirm that...
[x] My changes may be used in a future commercial release of VVVVVV
[x] I will be credited in a CONTRIBUTORS file and the "GitHub Friends"
section of the credits for all of said releases, but will NOT be compensated
for these changes
Changes:
This PR adds a new string formatting system to replace uses of
SDL_snprintf
and string concatenation.Making our own string formatting system has been briefly discussed during the review of the localization branch, and on the VVVVVV Discord. It's inspired by Python's format strings, but simpler.
This is primarily to benefit localization - strings will be easier to understand (
Now using %s Tileset
→Now using {area} Tileset
,"%s remain"
→"{n_crewmates|wordy} remain"
), translators can change the word order for their language's grammar (%1$s
is a POSIX extension), and this system is also less error-prone (making the format string not align with the actual arguments won't result in a crash or UB).It also integrates our needs better - particularly the "wordy" numbers without having to have a
help.number_words(n).c_str()
at the callsite, translators can opt in and out of wordy numbers per string, and this should also make it easier to solve #859 (see below).This PR adds the formatting system itself, and changes one
SDL_snprintf
in the code to use it as a small demo (the rest should probably be done in the localization branch to avoid more unneeded work):Usage
There are three main functions:
vformat_cb
- gives parts of the output string to a user-supplied callbackvformat_buf
- does whatSDL_snprintf
does - fill a user-supplied buffervformat_alloc
- a convenience function for when you don't want to hardcode an arbitrary limit (needs to beSDL_free
d)All need to be passed the format string with placeholders to be filled in, an index string describing the arguments, and a number of arguments as described by the index string.
The placeholders in the format string look like
{name}
or{name|flags}
. Flags are separated by|
. Some examples of flags arewordy
(to turn an integer into a string like "Twenty") anddigits=N
(which works like%0Nd
or%Nd
depending on thespaces
flag).The arguments index has a comma-separated list of argument names and their types, like
"number:int, total:int, crewmate:str"
. There's also abut
type, for replacing a certain button constant (likeBUTTON_INTERACT
) to a controller icon in the future.Full example: (apparently GitHub's syntax highlighting breaks on this, dunno why)
=>
"Vermilion got 2 out of 20 trinkets in 2:03.001"
The system is fully documented at the top of
VFormat.h
.Benchmarks
I wanted to know how fast my new system was compared to the
[SDL_]snprintf
we already use, so I decided to run two technically equivalent examples 10 million times:While I did use
SDL_
functions in line with the rest of the codebase, I'd like to mention I consistently got a somewhat faster result when I used vanillastrlen
,memcmp
,strchr
,malloc
,free
,memcpy
andstrtol
, but it's not overly dramatic:(non-
SDL_
)(
SDL_
)Either way, this seems very reasonable -
printf
-family functions have been around for decades, don't support reorderable arguments (thus only go through the vararg list once), and don't compare names of arguments.C++20
std::format
After designing and implementing all of this, I discovered that C++20 adds a somewhat similar Python-inspired formatting system. So I wanted to benchmark this as well just for fun. Which required me to benchmark on MSVC 2019+, the only place that seems to support it currently. And then only under
/std:c++latest
, not/std:c++20
.(
max
is now only 1 million since this is a slower computer)So this makes VFormat's numbers look even more reasonable. :)
Controller icons
Since this branch already prepares a bit for implementing controller button icons within text (#859), my general idea is to assign constant values for each generic button purpose (for example,
BUTTON_FLIP
=0,BUTTON_LEFT
=1,BUTTON_RIGHT
=2,BUTTON_INTERACT
=3, ...), and to reserve some unused Unicode characters to correspond to these values, which the text renderer will display as appropriate button icons for these actions (we could use the Private Use Area starting at U+E000). Note that something likeBUTTON_FLIP
might be (B) with one controller layout, (❌) with another, and just "ACTION" if you use a keyboard. So the corresponding PUA characters would be kind of dynamically adapting emojis. That way, the text renderer (andGraphics::len
) have something easy to work with (just a single codepoint), and the formatting system can then be used to easily get these characters into the text:=>
"Press � to activate terminal"
Legal Stuff:
By submitting this pull request, I confirm that...
CONTRIBUTORS
file and the "GitHub Friends" section of the credits for all of said releases, but will NOT be compensated for these changes