`archive_reader_binsafe::read_object_begin` performance

Try commented 2 years ago

Followup to: https://github.com/lmichaelis/phoenix/issues/28 I've tested std::stringstream performance today. Change:

    char name[128] = {};
    char cls [128] = {};
    sscanf_s(line.c_str(), "[%s %s %u %u]",
             name, rsize_t(sizeof(name)),
             cls,  rsize_t(sizeof(cls)),
             &obj.version, &obj.index);
    obj.object_name = name;
    obj.class_name  = cls;

Benchmark (world = oldworld.zen)

    auto buf = entry->open();
    auto time = Tempest::Application::tickCount();
    auto world = phoenix::world::parse(buf, version().game == 1 ? phoenix::game_version::gothic_1
                                                                : phoenix::game_version::gothic_2);
    time = Tempest::Application::tickCount() - time;
    Tempest::Log::d("parse time = ", time);

While change is rough one, and hopefully there is a better way than sscanf, as prototype looking quite good:

// Before (time - milliseconds)
parse time = 2371
parse time = 2769
parse time = 2713
parse time = 2447

// After (roughly 40% improvement)
parse time = 1618
parse time = 1441
parse time = 1225
parse time = 1559

lmichaelis commented 2 years ago

Generally, I wouldn't be opposed to using sscanf since it makes really easy to understand what's being parsed and a 40% performance improvement, especially in read_object_begin would be very impressive. We would not be able to use sscanf_s, though, since that's non-standard. Something like

char class_name[128];
char object_name[128];

sscanf(line.c_str(), "[%127s %127s %hd %d]", object_name, class_name, &obj.version, &obj.index);

could work, however. I'll test that out.

lmichaelis commented 2 years ago

So I've done some benchmarking. I can't quite reproduce your 40% performance improvement. Maybe it's an issue with one of the standard libraries. I am still getting about 10-15% better performance so it's probably a good idea to implement it anyways but I'm kinda confused about this difference. Here are some of my benchmark results (sadly I don't have a proper benchmark suite yet so I can't really provide you with an easy way of reproducing this):

sscanf

Compiler	File	Mode	Iterations	Min (ms)	Max (ms)	Avg (ms)	Cumulative (s)	Approx. Avg. Difference
Clang
Clang 14.0.6	NEWWORLD.ZEN	Debug	10	5944	6067	5982	59	1% (67ms)
Clang 14.0.6	NEWWORLD.ZEN	Release	100	222	293	227	22	18% (51ms)
Clang 14.0.6	OU.BIN	Debug	50	796	824	802	40	5% (46ms)
Clang 14.0.6	OU.BIN	Release	200	47	55	48	9	29% (20ms)
Clang 14.0.6	OU.CSL	Debug	10	3697	3733	3707	37	11% (498ms)
Clang 14.0.6	OU.CSL	Release	200	167	183	175	35	15% (33ms)
GCC
GCC 12.2.0	NEWWORLD.ZEN	Debug	10	4162	4260	4189	41	3% (141ms)
GCC 12.2.0	NEWWORLD.ZEN	Release	100	209	236	210	21	18% (49ms)
GCC 12.2.0	OU.BIN	Debug	50	657	753	670	33	6% (50ms)
GCC 12.2.0	OU.BIN	Release	200	43	48	43	8	34% (23ms)
GCC 12.2.0	OU.CSL	Debug	10	2877	2938	2888	28	7% (224ms)
GCC 12.2.0	OU.CSL	Release	200	143	194	154	30	15% (28ms)

stringstream

Compiler	File	Mode	Iterations	Min (ms)	Max (ms)	Avg (ms)	Cumulative (s)
Clang
Clang 14.0.6	NEWWORLD.ZEN	Debug	10	6028	6157	6049	60
Clang 14.0.6	NEWWORLD.ZEN	Release	100	248	278	259	25
Clang 14.0.6	OU.BIN	Debug	50	843	867	848	42
Clang 14.0.6	OU.BIN	Release	200	68	73	68	13
Clang 14.0.6	OU.CSL	Debug	10	4167	4384	4205	42
Clang 14.0.6	OU.CSL	Release	200	198	221	208	41
GCC
GCC 12.2.0	NEWWORLD.ZEN	Debug	10	4309	4397	4330	43
GCC 12.2.0	NEWWORLD.ZEN	Release	100	232	258	233	23
GCC 12.2.0	OU.BIN	Debug	50	710	779	720	36
GCC 12.2.0	OU.BIN	Release	200	65	71	66	13
GCC 12.2.0	OU.CSL	Debug	10	3075	3176	3112	31
GCC 12.2.0	OU.CSL	Release	200	174	197	182	36

Patch

This are the changes I've made (I've applied the fix to both archive_reader_ascii as well as archive_reader_binsafe):

Index: source/archive/archive_ascii.cc
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/source/archive/archive_ascii.cc b/source/archive/archive_ascii.cc
--- a/source/archive/archive_ascii.cc   (revision 21dbecd6dc6890510a8218d0aa0868b6cbdb0b14)
+++ b/source/archive/archive_ascii.cc   (date 1668629105666)
@@ -27,15 +27,21 @@
            return false;

        input.mark();
+
        auto line = input.get_line();
+       char class_name[128];
+       char object_name[128];

-       if (!line.starts_with('[') || !line.ends_with(']') || line.length() <= 2) {
+       auto parsed_elements =
+           std::sscanf(line.c_str(), "[%127s %127s %hu %u]", object_name, class_name, &obj.version, &obj.index);
+
+       if (parsed_elements != 4) {
            input.reset();
            return false;
        }

-       std::stringstream ss {line.substr(1, line.size() - 2)};
-       ss >> obj.object_name >> obj.class_name >> obj.version >> obj.index;
+       obj.object_name = object_name;
+       obj.class_name = class_name;
        return true;
    }

Index: source/archive/archive_binsafe.cc
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/source/archive/archive_binsafe.cc b/source/archive/archive_binsafe.cc
--- a/source/archive/archive_binsafe.cc (revision 21dbecd6dc6890510a8218d0aa0868b6cbdb0b14)
+++ b/source/archive/archive_binsafe.cc (date 1668629090506)
@@ -41,13 +41,19 @@
        }

        auto line = input.get_string(input.get_ushort());
-       if (!line.starts_with('[') || !line.ends_with(']') || line.length() <= 2) {
+       char class_name[128];
+       char object_name[128];
+
+       auto parsed_elements =
+           std::sscanf(line.c_str(), "[%127s %127s %hu %u]", object_name, class_name, &obj.version, &obj.index);
+
+       if (parsed_elements != 4) {
            input.reset();
            return false;
        }

-       std::stringstream ss {line.substr(1, line.size() - 2)};
-       ss >> obj.object_name >> obj.class_name >> obj.version >> obj.index;
+       obj.object_name = object_name;
+       obj.class_name = class_name;
        return true;
    }

Try commented 2 years ago

I can't quite reproduce your 40% performance improvement.

Windows10, mingw, debug. Also running stuff on laptop definitely not helping Today I've attempted to measure again - performance is all over the place unfortunately(sometimes better, sometimes worse).

While on that I have a question about read_string design: If you have buffer with mmap or real memory behind the hood, would it make more sense to return a string_view from buffer::get_string? That might be helpful for code that doesn't store string, but parses it instead.

PS: Reason why debug performance of load-time has value to me, is to iterate quicker on a game development.

lmichaelis commented 2 years ago

If you have buffer with mmap or real memory behind the hood, would it make more sense to return a string_view from buffer::get_string?

The idea behind buffer is, that you can supply your own backend (ie. a std::ifstream). This design necessitates copying data instead of just referring to it through a pointer. It just happens to be the case that by default, only heap and mmap backings are available:

https://github.com/lmichaelis/phoenix/blob/fdfca88edd133efa38b6fef375036c4b8fadd29d/source/buffer.cc#L12-L118

In addition to that, copying the data into a std::string makes it much safer to use and you can't accidentally refer to unmapped memory after the buffer deallocates. Before thinking about doing something like this, I'd like to explore other areas to possibly improve on.

Reason why debug performance of load-time has value to me, is to iterate quicker on a game development.

I agree. Performance is important. I'd like to avoid sacrificing memory safety for performance wherever possible though.

GothicKit / ZenKit