TerryCavanagh / VVVVVV

The source code to VVVVVV! http://thelettervsixtim.es/
Other
7k stars 559 forks source link

Implement new string formatting system (VFormat) #878

Closed Daaaav closed 2 years ago

Daaaav commented 2 years ago

Changes:

This PR adds a new string formatting system to replace uses of SDL_snprintf and string concatenation.

Making our own string formatting system has been briefly discussed during the review of the localization branch, and on the VVVVVV Discord. It's inspired by Python's format strings, but simpler.

This is primarily to benefit localization - strings will be easier to understand (Now using %s TilesetNow using {area} Tileset, "%s remain""{n_crewmates|wordy} remain"), translators can change the word order for their language's grammar (%1$s is a POSIX extension), and this system is also less error-prone (making the format string not align with the actual arguments won't result in a crash or UB).

It also integrates our needs better - particularly the "wordy" numbers without having to have a help.number_words(n).c_str() at the callsite, translators can opt in and out of wordy numbers per string, and this should also make it easier to solve #859 (see below).

This PR adds the formatting system itself, and changes one SDL_snprintf in the code to use it as a small demo (the rest should probably be done in the localization branch to avoid more unneeded work):

     char buffer[64];
-    SDL_snprintf(buffer, sizeof(buffer), "Now using %s Tileset", tilesets[tiles]);
+    vformat_buf(buffer, sizeof(buffer), "Now using {area} Tileset", "area:str", tilesets[tiles]);

Usage

There are three main functions:

All need to be passed the format string with placeholders to be filled in, an index string describing the arguments, and a number of arguments as described by the index string.

The placeholders in the format string look like {name} or {name|flags}. Flags are separated by |. Some examples of flags are wordy (to turn an integer into a string like "Twenty") and digits=N (which works like %0Nd or %Nd depending on the spaces flag).

The arguments index has a comma-separated list of argument names and their types, like "number:int, total:int, crewmate:str". There's also a but type, for replacing a certain button constant (like BUTTON_INTERACT) to a controller icon in the future.

Full example: (apparently GitHub's syntax highlighting breaks on this, dunno why)

char buffer[100];
vformat_buf(buffer, sizeof(buffer),
    "{crewmate} got {number} out of {total} trinkets in {m}:{s|digits=2}.{ms|digits=3}",
    "number:int, total:int, crewmate:str, m:int, s:int, ms:int",
    2, 20, "Vermilion", 2, 3, 1
);

=> "Vermilion got 2 out of 20 trinkets in 2:03.001"

The system is fully documented at the top of VFormat.h.

Benchmarks

I wanted to know how fast my new system was compared to the [SDL_]snprintf we already use, so I decided to run two technically equivalent examples 10 million times:

//        ..   ...
int max = 10000000;

clock_t t0 = clock();
for (int ii = 0; ii < max; ii++)
{
    char buffer[100];
    SDL_snprintf(buffer, sizeof(buffer),
        "%s got %d out of %d trinkets in %d:%02d.%03d",
        "Vermilion", 2, 20, 2, 3, 1
    );
}
double elapsed_time = (double)(clock() - t0) / CLOCKS_PER_SEC;
printf("SDL_snprintf: %fs\n", elapsed_time);

t0 = clock();
for (int ii = 0; ii < max; ii++)
{
    char buffer[100];
    vformat_buf(buffer, sizeof(buffer),
        "{crewmate} got {number} out of {total} trinkets in {m}:{s|digits=2}.{ms|digits=3}",
        "number:int, total:int, crewmate:str, m:int, s:int, ms:int",
        2, 20, "Vermilion", 2, 3, 1
    );
}
double elapsed_time2 = (double)(clock() - t0) / CLOCKS_PER_SEC;
printf("vformat_buf: %fs\n", elapsed_time2);

printf("We're only %fx as slow\n", elapsed_time2/elapsed_time);

While I did use SDL_ functions in line with the rest of the codebase, I'd like to mention I consistently got a somewhat faster result when I used vanilla strlen, memcmp, strchr, malloc, free, memcpy and strtol, but it's not overly dramatic:

(non-SDL_)

SDL_snprintf: 2.178149s
vformat_buf: 12.474551s
We're only 5.727134x as slow

(SDL_)

SDL_snprintf: 2.283219s
vformat_buf: 15.566886s
We're only 6.817956x as slow

Either way, this seems very reasonable - printf-family functions have been around for decades, don't support reorderable arguments (thus only go through the vararg list once), and don't compare names of arguments.

C++20 std::format

After designing and implementing all of this, I discovered that C++20 adds a somewhat similar Python-inspired formatting system. So I wanted to benchmark this as well just for fun. Which required me to benchmark on MSVC 2019+, the only place that seems to support it currently. And then only under /std:c++latest, not /std:c++20.

(max is now only 1 million since this is a slower computer)

t0 = clock();
for (int ii = 0; ii < max; ii++)
{
    std::format(
        "{2} got {0} out of {1} trinkets in {3}:{4:02}.{5:03}",
        2, 20, "Vermilion", 2, 3, 1
    );
}
double elapsed_time_sf = (double)(clock() - t0) / CLOCKS_PER_SEC;
printf("std::format: %fs\n", elapsed_time_sf);
snprintf: 4.868000s
std::format: 64.290000s
v_format_buf: 9.640000s
We're only 1.980279x as slow as snprintf and 6.669087x as fast as std::format

So this makes VFormat's numbers look even more reasonable. :)

Controller icons

Since this branch already prepares a bit for implementing controller button icons within text (#859), my general idea is to assign constant values for each generic button purpose (for example, BUTTON_FLIP=0, BUTTON_LEFT=1, BUTTON_RIGHT=2, BUTTON_INTERACT=3, ...), and to reserve some unused Unicode characters to correspond to these values, which the text renderer will display as appropriate button icons for these actions (we could use the Private Use Area starting at U+E000). Note that something like BUTTON_FLIP might be (B) with one controller layout, (❌) with another, and just "ACTION" if you use a keyboard. So the corresponding PUA characters would be kind of dynamically adapting emojis. That way, the text renderer (and Graphics::len) have something easy to work with (just a single codepoint), and the formatting system can then be used to easily get these characters into the text:

vformat_alloc("Press {button} to activate terminal", "button:but", BUTTON_INTERACT);

=> "Press � to activate terminal"

Legal Stuff:

By submitting this pull request, I confirm that...