LandSandBoat / server

:sailboat: LandSandBoat - a server emulator for Final Fantasy XI
https://landsandboat.github.io/server/
GNU General Public License v3.0
303 stars 614 forks source link

🔨 Death to ZMQ? (IPC & RPC refresh) #6165

Open zach2good opened 3 months ago

zach2good commented 3 months ago

I affirm:

Describe the feature

Hello, if you're here you're interested in the clickbait-y title, and you're not the first to fall foul of this and post/message me/complain about it. Pull up a chair and let me weave you a tale.

In-progress PR: https://github.com/LandSandBoat/server/pull/6179

===

A surprising title, I know, especially coming from me - ZMQ's number one biggest fan. I simp for ZMQ.

ZMQ is a low-level library, designed to act as a highly reliable and high-performance abstraction on top of regular sockets. At this it is excellent, truly world class.

Unfortunately, our use case is a level or two of abstraction higher than just needing good/fast/reliable sockets. We need to communicate between our multiple processes easily and reliably, in a way that doesn't completely gatekeep this work to C++ nerds. A regular person shouldn't have to be dealing with bits and bytes, defining message types, serialising and deserialising them, thinking about the context in which their message is received, popping off the front ZMQ message saying where the sender is, etc. You get my point. This is how it is today, and a grand total of 3 people have confidently strode into ZMQ-land to add/fix/change things, and it hasn't always gone well.

My proposal is that even in the face of larger changes over the next 12 months (splitting zone workload up onto different worker threads, moving networking onto it's own thread, making CPU-bound operations like navmesh and SQL act more they're IO-bound, etc.), we need to make it easier to put systems onto the world server and have map servers communicate with it. We need this for a new party system, where all the logic happens on the world server as the single source of truth, and the map servers are just in charge of translating packets into internal events or structs.

My original proposal was for the retirement of ZMQ as our transport layer and manual message definition on top of that, with gRPC - where we just specify what we want in their IDL and then it'll generate the relevant Server and Stub classes for our use, and we might have to do some customisation on top depending on what we need.

However, upon digging into this, gRPC is absolutely massive. It isn't a small dependency. It's also not a "clone the repo, import it into CMake, link the relevant target, done" kind of dependency. It relies on abseil - Google's C++ support library, akin to boost's headers.

Fine, looking at it, gRPC is a bunch of plumbing and a HTTP-based transport layer on top of Protocol Buffers, so we can just use protobuf and not bring in the complexity of the gRPC build? protobuf docs say that you can bring your own transport to the services and rpc calls you define. Doesn't seem to generate anything extra from my testing 👀

protobuf also needs to build its compiler protoc, which has a hard dependency on abseil too. But Zach, do we need to build protoc? Can't we just ship a pre-built binary? No. Bad dog. We're not in the business of shipping binaries. We're only shipping pre-built DLLs because of legacy and how long those libraries take to build, and how fiddly those builds can be.

Fine! Then why don't we go a step lower and use something like flatbuffers, which has no dependency on abseil? Sure, but that's just a serialization and helpers framework. It doesn't give us an easy to use IDL and codegen.

So we've gone through this whole journey of looking for battletested replacements and we've discovered that nothing is fit for purpose. So what does that leave us?

In light of the other work I'm planning, we can easily parse proto3 files ourselves with Python, and then output very simple structs, serialization/deserialization routines, helper classes, and plug them into existing transport infrastructure. The part that needs my future work is handling return values and original caller context.

auto returnValue = blocking_zmq_call(...);

What should we do here? ZMQ is fast, all of our other plumbing is fast, but we should never block on the response from a network call of any kind. That's why our current infrastructure is fire and forget. We don't get a return value, but we specify handlers for results that come back during later calls. These aren't usable in your original calling context though. Most likely, they'll come back at the end of your current tick, or during a later tick, and your entity might be gone.

But what if

auto returnValue = co_await blocking_zmq_call(...);

Where we shunt the current execution context and this blocking call off to a worker thread, or a zmq queue, and continue executing something else. Then when this comes back, we keep all the context of the original call, and the person writing the code doesn't have to think about concurrency, it just happens.

This is all still up in the air, but in light of some unsolicited feedback I've had recently, I thought I'd explain my direction with this.

EDIT: Same as with my original post text: Please don't weigh in if you don't know what the problem being solved here is

Appendix

zach2good commented 3 months ago

gRPC comes with a metric buttload of other dependencies, the worst of which is abseil - which is akin to boost. The idea of using protobuff (or flatbuffers) on top of our existing ZMQ plumbing isn't so evil in comparison, and I've already played around with doing this: https://github.com/zach2good/zmq_flatbuff_test

We would just need our own completion queue, and tasty helper functions. Could also be yet another excuse to hook up coroutines.

zach2good commented 3 months ago

I am VERY keen on the workflow of:

zach2good commented 3 months ago

Things like defining the service, how completions are executed, etc. can be up to us, but could get very ugly very quickly if we don't consider their use carefully.

zach2good commented 3 months ago

https://protobuf.dev/programming-guides/proto3/#services https://protobuf.dev/programming-guides/proto2/#services

We can use protobufs and provide our own transport?

zach2good commented 3 months ago

As it turns out, proto ALSO pulls in absl...

So, I've dug out and have been playing with my old flatbuffer+ZMQ experiments: https://github.com/zach2good/zmq_flatbuff_test

zach2good commented 3 months ago

But also, our own IDL and codegen in Python: https://github.com/LandSandBoat/server/pull/6179

zach2good commented 2 months ago

Started playing with proto parsing in Python this morning:

from proto_schema_parser.parser import Parser

text = """
syntax = "proto3";

// Each message (struct) will have to_string, serialization, and deserialization methods
// generated for it.
message ChatMessage
{
    string message = 1;
    string sender = 2;
    string recipient = 3;
}

service IChatService
{
    // virtual void SendChatMessage(const ChatMessage&) = 0;
    rpc SendChatMessage(ChatMessage) returns (void);

    // TODO: Maybe this shouldn't produce a pure virtual function.
    //     : Maybe it should be completely opaque to the user, serialising
    //     : and deserialising the data for them, and sending it over the wire.

    // co_await SendChatMessage(ChatMessage);
}
"""

# File(syntax='proto3',
#      file_elements=[Message(name='SearchRequest',
#                             elements=[
#                                 Field(name='query', number=1, type='string', cardinality=None, options=[]),
#                                 Field(name='page_number', number=2, type='int32', cardinality=None, options=[]),
#                                 Field(name='result_per_page', number=3, type='int32', cardinality=None, options=[])]),
#                     Message(name='Result',
#                             elements=[
#                                 Field(name='url', number=1, type='string', cardinality=None, options=[]),
#                                 Field(name='title', number=2, type='string', cardinality=None, options=[]),
#                                 Field(name='snippets', number=3, type='string', cardinality=<FieldCardinality.REPEATED: 'REPEATED'>, options=[])]),
#                     Message(name='SearchResponse',
#                             elements=[
#                                 Field(name='results', number=1, type='Result', cardinality=<FieldCardinality.REPEATED: 'REPEATED'>, options=[])]),
#
#                     Service(name='SearchService', elements=[
#                         Method(name='Search', input_type=MessageType(type='SearchRequest', stream=False), output_type=MessageType(type='SearchResponse', stream=False), elements=[])])])

def generate_cpp(ast):
    for element in ast.file_elements:
        type = element.__class__.__name__
        if type == "Message":
            print (f"struct {element.name} final\n{{")
            for sub_element in element.elements:
                if sub_element.type == "string":
                    print(f"    std::string {sub_element.name}; // {sub_element.number}")
                else:
                    # TODO: Handle number, cardinality, and options
                    print(f"    {sub_element.type} {sub_element.name}; // {sub_element.number}")
            print("};\n")
            print(f"// TODO: Implement {element.name} to_string, serialization, and deserialization\n")
        elif type == "Service":
            print(f"class {element.name}\n{{")
            for sub_element in element.elements:
                print(f"    virtual {sub_element.output_type.type} {sub_element.name}(const {sub_element.input_type.type}&) = 0;")
            print("};\n")

def main():
    try:
        ast = Parser().parse(text)
        generate_cpp(ast)
    except Exception as e:
        print(e)
        exit(1)

if __name__ == "__main__":
    main()

# Produces:
#
# struct ChatMessage final
# {
#     std::string message; // 1
#     std::string sender; // 2
#     std::string recipient; // 3
# };
#
# // TODO: Implement ChatMessage to_string, serialization, and deserialization
#
# class IChatService
# {
#     virtual void SendChatMessage(const ChatMessage&) = 0;
# };

Still thinking about how manual the underlying implementation should be. Leaning towards "not at all"

zach2good commented 2 months ago

Anyone interested in seeing just how bad the build for absl and protoc is, I've pushed my test project: https://github.com/zach2good/zmq_protobuf_test

(based off of my flatbuffers test: https://github.com/zach2good/zmq_flatbuff_test)

zach2good commented 2 months ago

Added rambling explanation in the original post, for any interested passers-by

zach2good commented 2 months ago

One direction I haven't yet looked into, is an integration with ASIO, but that falls more under my plans with TaskExecutors, coroutines, etc. than this transport/RPC work