GothicKit / ZenKit

A re-implementation of file formats used by the early 2000's ZenGin
http://zk.gothickit.dev/
MIT License
47 stars 10 forks source link

`archive_reader_binsafe::read_object_begin` performance #37

Closed Try closed 1 year ago

Try commented 1 year ago

Followup to: https://github.com/lmichaelis/phoenix/issues/28 I've tested std::stringstream performance today. Change:

    char name[128] = {};
    char cls [128] = {};
    sscanf_s(line.c_str(), "[%s %s %u %u]",
             name, rsize_t(sizeof(name)),
             cls,  rsize_t(sizeof(cls)),
             &obj.version, &obj.index);
    obj.object_name = name;
    obj.class_name  = cls;

Benchmark (world = oldworld.zen)

    auto buf = entry->open();
    auto time = Tempest::Application::tickCount();
    auto world = phoenix::world::parse(buf, version().game == 1 ? phoenix::game_version::gothic_1
                                                                : phoenix::game_version::gothic_2);
    time = Tempest::Application::tickCount() - time;
    Tempest::Log::d("parse time = ", time);

While change is rough one, and hopefully there is a better way than sscanf, as prototype looking quite good:

// Before (time - milliseconds)
parse time = 2371
parse time = 2769
parse time = 2713
parse time = 2447

// After (roughly 40% improvement)
parse time = 1618
parse time = 1441
parse time = 1225
parse time = 1559
lmichaelis commented 1 year ago

Generally, I wouldn't be opposed to using sscanf since it makes really easy to understand what's being parsed and a 40% performance improvement, especially in read_object_begin would be very impressive. We would not be able to use sscanf_s, though, since that's non-standard. Something like

char class_name[128];
char object_name[128];

sscanf(line.c_str(), "[%127s %127s %hd %d]", object_name, class_name, &obj.version, &obj.index);

could work, however. I'll test that out.

lmichaelis commented 1 year ago

So I've done some benchmarking. I can't quite reproduce your 40% performance improvement. Maybe it's an issue with one of the standard libraries. I am still getting about 10-15% better performance so it's probably a good idea to implement it anyways but I'm kinda confused about this difference. Here are some of my benchmark results (sadly I don't have a proper benchmark suite yet so I can't really provide you with an easy way of reproducing this):

sscanf

Compiler File Mode Iterations Min (ms) Max (ms) Avg (ms) Cumulative (s) Approx. Avg. Difference
Clang
Clang 14.0.6 NEWWORLD.ZEN Debug 10 5944 6067 5982 59 1% (67ms)
Clang 14.0.6 NEWWORLD.ZEN Release 100 222 293 227 22 18% (51ms)
Clang 14.0.6 OU.BIN Debug 50 796 824 802 40 5% (46ms)
Clang 14.0.6 OU.BIN Release 200 47 55 48 9 29% (20ms)
Clang 14.0.6 OU.CSL Debug 10 3697 3733 3707 37 11% (498ms)
Clang 14.0.6 OU.CSL Release 200 167 183 175 35 15% (33ms)
GCC
GCC 12.2.0 NEWWORLD.ZEN Debug 10 4162 4260 4189 41 3% (141ms)
GCC 12.2.0 NEWWORLD.ZEN Release 100 209 236 210 21 18% (49ms)
GCC 12.2.0 OU.BIN Debug 50 657 753 670 33 6% (50ms)
GCC 12.2.0 OU.BIN Release 200 43 48 43 8 34% (23ms)
GCC 12.2.0 OU.CSL Debug 10 2877 2938 2888 28 7% (224ms)
GCC 12.2.0 OU.CSL Release 200 143 194 154 30 15% (28ms)

stringstream

Compiler File Mode Iterations Min (ms) Max (ms) Avg (ms) Cumulative (s)
Clang
Clang 14.0.6 NEWWORLD.ZEN Debug 10 6028 6157 6049 60
Clang 14.0.6 NEWWORLD.ZEN Release 100 248 278 259 25
Clang 14.0.6 OU.BIN Debug 50 843 867 848 42
Clang 14.0.6 OU.BIN Release 200 68 73 68 13
Clang 14.0.6 OU.CSL Debug 10 4167 4384 4205 42
Clang 14.0.6 OU.CSL Release 200 198 221 208 41
GCC
GCC 12.2.0 NEWWORLD.ZEN Debug 10 4309 4397 4330 43
GCC 12.2.0 NEWWORLD.ZEN Release 100 232 258 233 23
GCC 12.2.0 OU.BIN Debug 50 710 779 720 36
GCC 12.2.0 OU.BIN Release 200 65 71 66 13
GCC 12.2.0 OU.CSL Debug 10 3075 3176 3112 31
GCC 12.2.0 OU.CSL Release 200 174 197 182 36

Patch

This are the changes I've made (I've applied the fix to both archive_reader_ascii as well as archive_reader_binsafe):

Index: source/archive/archive_ascii.cc
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/source/archive/archive_ascii.cc b/source/archive/archive_ascii.cc
--- a/source/archive/archive_ascii.cc   (revision 21dbecd6dc6890510a8218d0aa0868b6cbdb0b14)
+++ b/source/archive/archive_ascii.cc   (date 1668629105666)
@@ -27,15 +27,21 @@
            return false;

        input.mark();
+
        auto line = input.get_line();
+       char class_name[128];
+       char object_name[128];

-       if (!line.starts_with('[') || !line.ends_with(']') || line.length() <= 2) {
+       auto parsed_elements =
+           std::sscanf(line.c_str(), "[%127s %127s %hu %u]", object_name, class_name, &obj.version, &obj.index);
+
+       if (parsed_elements != 4) {
            input.reset();
            return false;
        }

-       std::stringstream ss {line.substr(1, line.size() - 2)};
-       ss >> obj.object_name >> obj.class_name >> obj.version >> obj.index;
+       obj.object_name = object_name;
+       obj.class_name = class_name;
        return true;
    }

Index: source/archive/archive_binsafe.cc
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/source/archive/archive_binsafe.cc b/source/archive/archive_binsafe.cc
--- a/source/archive/archive_binsafe.cc (revision 21dbecd6dc6890510a8218d0aa0868b6cbdb0b14)
+++ b/source/archive/archive_binsafe.cc (date 1668629090506)
@@ -41,13 +41,19 @@
        }

        auto line = input.get_string(input.get_ushort());
-       if (!line.starts_with('[') || !line.ends_with(']') || line.length() <= 2) {
+       char class_name[128];
+       char object_name[128];
+
+       auto parsed_elements =
+           std::sscanf(line.c_str(), "[%127s %127s %hu %u]", object_name, class_name, &obj.version, &obj.index);
+
+       if (parsed_elements != 4) {
            input.reset();
            return false;
        }

-       std::stringstream ss {line.substr(1, line.size() - 2)};
-       ss >> obj.object_name >> obj.class_name >> obj.version >> obj.index;
+       obj.object_name = object_name;
+       obj.class_name = class_name;
        return true;
    }
Try commented 1 year ago

I can't quite reproduce your 40% performance improvement.

Windows10, mingw, debug. Also running stuff on laptop definitely not helping Today I've attempted to measure again - performance is all over the place unfortunately(sometimes better, sometimes worse).

While on that I have a question about read_string design: If you have buffer with mmap or real memory behind the hood, would it make more sense to return a string_view from buffer::get_string? That might be helpful for code that doesn't store string, but parses it instead.

PS: Reason why debug performance of load-time has value to me, is to iterate quicker on a game development.

lmichaelis commented 1 year ago

If you have buffer with mmap or real memory behind the hood, would it make more sense to return a string_view from buffer::get_string?

The idea behind buffer is, that you can supply your own backend (ie. a std::ifstream). This design necessitates copying data instead of just referring to it through a pointer. It just happens to be the case that by default, only heap and mmap backings are available:

https://github.com/lmichaelis/phoenix/blob/fdfca88edd133efa38b6fef375036c4b8fadd29d/source/buffer.cc#L12-L118

In addition to that, copying the data into a std::string makes it much safer to use and you can't accidentally refer to unmapped memory after the buffer deallocates. Before thinking about doing something like this, I'd like to explore other areas to possibly improve on.

Reason why debug performance of load-time has value to me, is to iterate quicker on a game development.

I agree. Performance is important. I'd like to avoid sacrificing memory safety for performance wherever possible though.