LandSandBoat / server

:sailboat: LandSandBoat - a server emulator for Final Fantasy XI
https://landsandboat.github.io/server/
GNU General Public License v3.0
294 stars 590 forks source link

🔨 [C++] Death to keyword `extern` #4745

Open zach2good opened 10 months ago

zach2good commented 10 months ago

I affirm:

Describe the feature

https://learn.microsoft.com/en-us/cpp/cpp/extern-cpp?view=msvc-170

We have a lot of globals that are accessed from different compilation units, being declared by extern in the secondary units. This feels very 'C', and I presume there's a more 'C++'/standards compliant way to achieve this. I don't know what that would be though...

atom0s commented 10 months ago

C++ hasn't really developed anything to deal with this kind of thing specifically. Instead, its feature set has allowed for more 'unique' ways of dealing with it. Using extern is still fine and proper in C++, but it is an inherited keyword from C. The way C++ treats things, in regards to extern, is also different and extern has grown to mean more things in C++ as well. The differences can be checked out on the docs here:

However, it does offer a few ways of dealing with the problem in a different manner. One of those ways is the use of namespaces while still using extern. It allows things to be defined more specifically and helps avoid name collisions, but still requires the same setup, just with more wordage.

Another solution is to use singletons. (I can already feel the hate towards saying that word haha..) The singleton pattern in C++ has gotten a ton of hate in the past due to the way things had to be implemented. It came with its own set of problems such as thread safety, potential memory leaks (when the singleton object still required a pointer), etc. However, in more modern revisions of the C++ standard, these problems became a thing of the past, but the hate still lived on.

C++11 added to the standard that functions can now define local static variables that will have static storage duration and be guaranteed to only initialize once. This also guaranteed initialization thread safety. (See ref here: https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables)

This also changed the way the singleton object could be defined:

The new method of defining a singleton class is known as the Meyers implementation, named after Scott Meyers. That defines an object like this:

class SomeObject
{
    // Deleted Copy and Assignment Operators
    SomeObject(SomeObject const&)            = delete;
    SomeObject(SomeObject&&)                 = delete;
    SomeObject& operator=(SomeObject const&) = delete;
    SomeObject& operator=(SomeObject&&)      = delete;

    // Private Constructor and Destructor
    SomeObject(void);
    ~SomeObject(void);

public:
    static SomeObject& instance(void)
    {
        static SomeObject obj;
        return obj;
    }
};

This singleton object can then be further well-defined by being contained within namespace(s).

As mentioned above, with this kind of setup we are gaining some positives such as:

These are benefits over the extern usage as it:

Sadly, singletons still get a lot of hate for whatever reason. Used properly and when needed, they are a great design pattern that has been much better enhanced by the various changes to the C++ standard over the years.

Other means of implementing a singleton exist as well, such as using the even more modern std::call_once, but that then adds back other downsides depending on how it's used.

zach2good commented 10 months ago

Thanks as always for the detailed and well thought out writeup!

We do have plumbing for CRTP singletons here: https://github.com/LandSandBoat/server/blob/base/src/common/singleton.h

I have no beef with singletons, it's a wildly convenient pattern and can save a lot of time. However, the direction I'd like to go in would not involve singletons. I'd rather head in a direction where we'd have tighter control over object lifetimes and avoid static storage altogether. There's no reason we can't be passing down shared classes by reference in constructors. Tech bro jargon: Dependency Inversion.

I'm in no way influenced by seeing this used well for the first time at my new work, compared to having seen it done very very badly in the past (this issue was made before that anyway). 👀

In our current usage, we only have CVanaTime and CTaskMgr as singletons. I'd be keen to not migrate all of the extern stuff I'm targetting with this issue into singletons, when they're typically just constants, containers, and free standing functions.

zach2good commented 10 months ago

(There are some systems that work crazy well as singletons, I've just remembered that all of our logging through spdlog is probably a singleton)

atom0s commented 10 months ago

We do have plumbing for CRTP singletons here: https://github.com/LandSandBoat/server/blob/base/src/common/singleton.h

There's a lot of ways to accomplish a singleton in C++, this being one of them as well, but this one does come with some negatives that are solved by using the Meyers approach instead.

  1. The CRTP approach makes use of an allocation while the Meyers approach does not. While not the biggest concern, it does leave a means for a leak to happen under certain conditions that the Meyers approach does not have on the base object itself. (Each approach does have its potential for not properly cleaning up the inner objects of the singleton as well though based on how the application exits.)

  2. The CRTP approach is not initialization thread-safe. This means that in the event that LSB begins adding any kind of multi-threading into the core, the current singleton implementation is going to require additional 'bloat' to ensure thread safety through some means of locking. This then adds another object and lifetime into the mix, ie. using a mutex to ensure there is no initialization race condition. The Meyers approach avoids this automatically as the use of a function level static local variables are guaranteed to be thread-safe initialized as of C++11.

  3. The CRTP (and pretty much every other singleton implementation) incurs additional overhead in its getInstance method because of multiple factors:

    • It MUST check for nullptr every time to ensure the instance is created.
    • It MUST implement thread-safe locking to ensure initialization thread-safety. (This then adds the overhead of checking, taking and releasing the lock.)

The Meyers approach does not suffer from any of this because the standards automatically guarantee its safety and there is no required null check because once the object is requested a single time, it will always exist through the lifetime of the application. (Note: There are ways to incorrectly use any singleton and cause a 'use-after-free' error even with the Meyers approach.)

However, the direction I'd like to go in would not involve singletons. I'd rather head in a direction where we'd have tighter control over object lifetimes and avoid static storage altogether. There's no reason we can't be passing down shared classes by reference in constructors. Tech bro jargon: Dependency Inversion.

For sure, I hadn't checked LSB's current codebase before making my comments above to see what all the extent of extern was still in the current codebase. Skimming through the code base now, most of the usages are due to some of the codebase being based on older mixed C/C++ code.

modules\custom\cpp\ah_announcement.cpp modules\custom\cpp\ah_pagination.cpp modules\renamer\cpp\renamer.cpp

The usages in these looks fine and normal for what they are doing. However, it can be cleaned up by changing the way that the PacketSize and PacketParser objects are defined/stored overall. These could also be stored together if desired into a single object container of some sort etc. then having that container passed around by reference for example.

src\common\kernel.h

This, and several other similar files, are using it in a manner that can definitely be overhauled and modernized into proper C++ objects or namespaced properly. These are based on older C code from the original rAthena codebase which was used as the original foundation for the main rewrite of the ProjectXI server project. The way this file, and others, are written can easily get rid of the globals they use as well as confine the various functions being used into a parent namespace or object which will then remove any need for extern in them.

src\common\lua.h src\common\lua.cpp src\common\settings.h src\common\settings.cpp

There are some considerations that can go into how these files are used. As it currently stands, they are 'common' because each server makes use of Lua based settings. A future discussion could be if this is an ideal manner to handle settings for all services. While Lua itself does not add too much bloat/overhead, it is still some. If reducing the memory footprint is any kind of longterm/stretch goal of the project to help it run on lower end hardware, it is still something to consider. On-top of that, the usage of the sol library adds easily measurable overhead to both the memory usage within Lua's state as well as the compiled footprint of the binaries depending on how much binding is being done.

As much as I love and use sol myself in personal projects, including Ashita v4, I am not keen on how much bloat it causes due to the way its coded around std::tuple or how it makes use of the global environment space in the Lua state the way it does.

It may be beneficial to move settings into a different format instead such as json, yaml, xml, ini, etc. I see no reason why this would be a problem and can still exist in a feature-parity manner as these kinds of things can all be runtime editable as well, as well as being reloaded from disk at runtime if needed.

src\common\socket.h

Eh.. As discussed many times over elsewhere, the current socket setup in the codebase is not ideal, not proper to XI and has a lot of room for improvement. This is one that is probably going to continue as-is for a while to come until some major refactoring happens all of it in the future. To ensure cross-platform compatibility isn't damaged as well, care will need to be taken to sure whatever modifications are made do not break the requirements of other platforms like Linux.

In the meantime, some of whats there can be shuffled around and contained within an object and/or namespace to reduce the need for some of the variables and all of the functions to be extern.

<Map Server Files>

The main server files are in that same boat of being written in a manner that followed the older codebases C-like manner while being rewritten in C++. Many of these things can be easily cleaned up by moving to using a more C++ expected design in general. As well as containing the variables within a parent object instead of being globals.

The newer systems and utils inherited from that C-like style which resulted in things being externed, which can also take on similar changes.

<Search Server Files>

Pretty much the same here. The search server is likely to see an entire overhaul in the future anyway to move towards using asio and other modern benefits to the language etc.

In our current usage, we only have CVanaTime and CTaskMgr as singletons. I'd be keen to not migrate all of the extern stuff I'm targetting with this issue into singletons, when they're typically just constants, containers, and free standing functions.

Agreed, not everything using extern now is suitable for singleton usage. A lot of it can be contained into classes/objects but in some cases, just general cleanup, namespacing, and other reworking can remove the extern usages.

zach2good commented 10 months ago

You've touched on a lot of things that are planned, but they require someone to have the time and will to sit down and do them. Death to kernel.h in favour of application.h, networking/tasking rewrite to use ASIO, rewriting packet system to receive and validate every packet and then emit an internal event, splitting logic/networking onto different threads that communicate through lockless queues, etc.

I haven't thought yet about how to rework modules to be less... whatever they are, but I'm expecting to rewrite xi_map to be a wrapper that launches a "Core" and/or "Engine" that owns the packet system, task system, scripting, etc. Then the cpp modules will get access to that engine and can extract what they want. This is also what I need to do to port Eden's unit testing system.

In this new design, all singletons will have to go away in favour of being owned by the parent engine structure, so we can spin up multiple of them in a single executable for parallel test execution (multiple and differing vanatimes, tasks, lua states, etc.)

Sol, settings, Lua: I chose to use Lua for our settings language to keep the number of languages we have in the repo to an absolute minimum. It also removed an additional layer of translation for hot-reloading settings at runtime. I'm not concerned with the added code and memory bloat that comes with it compared to say, a lightweight ini parser and its associated state, because of how much other awful stuff is going on everywhere in the codebase 😆