We-the-People-civ4col-mod / Mod

This is the repository where the mod resides.
90 stars 37 forks source link

Consider using Interprocess communications #1047

Open Nightinggale opened 1 month ago

Nightinggale commented 1 month ago

A new idea on how to get a modern compiler to work has arrived, which is using Interprocess communications (IPC)

The idea is that the DLL will run in the exe process and on attach, it will start a new process (process B), which it then connects to. The DLL will then have to communicate with process B using IPC. Since IPC only supports arguments being transferred by value (no pointers/references), the requirement for using the same compiler is gone. The two processes will not share memory and as such it becomes easier to obey the 32 bit memory limit. If needed, process B can even be 64 bit, but that's probably not needed if it doesn't have to share memory with the graphics.

If we use a platform independent IPC approach (say TCP/IP), it will likely be slower, but we gain that process B doesn't have to be windows native or even running on the same device. It could be a gain for linux and adding native support for say ARM for the CPU heavy tasks. If we try to do anything non-windows, we should likely have transfer code, which supports more than one connection type or we will lose the performance boost of using windows internals on windows.

Using a setup like this will make the DLL file shrink to become some sort of data interchange between EXE, python and IPC, possibly with internal cache to avoid IPC overhead for frequent calls (think graphics engine in exe constantly requesting specific xml data, which never changes).

Splitting the two processes into two different source codes means python and exe can't call "our main code" directly. As a result, the whole issue with DllExport, python exposed and virtual functions being called will go away. Sure those problems still exist in the DLL, but since it's just data transfer and not a calculation code, it's not in the headers we call when making game logic.

Obviously being able to use any compiler allows us to use modern C++ with modern features and modern optimization. If we are lucky modern compilers optimizing for modern CPUs will speed up more than the extra overhead slows down.


The implementation might be quite simple in theory. Init process B when EXE attaches the DLL and kill process B when EXE detaches the DLL. Other than that, it seems all communication can be done with WM_COPYDATA. This will follow the same approach as network messages, except we can give it any struct to transfer and as such is not stuck with fixed arguments.

The structs used for transfers would ideally be in headers shared by source code for both processes. As such mixing 32 and 64 bit as well as mixing endianess will complicate the code and possibly overhead. This however isn't a concern if both are 32 bit x86, so that's mainly a concern if we go with native linux/mac/whatever process B.


Debugging might be an issue. We will need a setup, which freezes both processes. How to compile multiple targets using multiple compilers has yet to be determined, through MSVC has tools, which can do tasks like this. The main issue would be people compiling by running a bat file.

Atom735 commented 1 week ago

I found something more productive and simpler

https://learn.microsoft.com/en-us/windows/win32/memory/address-windowing-extensions

We can project a part of the virtual address space into different physical spaces, and store different types of objects in them, like for cities, units.

For example, we can pre-allocate an area for 1024 entities. And to access the object, add a layer that will remap this area for the pool of physical pages under the entity numbers id/1024, and inside take the entity by id%1024

And get rid of most of the memory reallocations (Many std::Map will need to be rewritten into a flat structure).

Nightinggale commented 1 week ago

I found something more productive and simpler

https://learn.microsoft.com/en-us/windows/win32/memory/address-windowing-extensions

Looks interesting. It will solve the 4 GB memory limit and it's likely easier and faster to implement than multiple processes. However the benefits of using two processes is that it allows us to use two different compilers and that way use not only modern C++, but also modern optimization. A new process can also read dll files from a custom location as in within the mod itself, so we can skip the need for users to copy tbb dlls and any future dll files.

On the other hand, multiple processes comes with added complexity, like less trivial debugging. This is something to think about.

We can project a part of the virtual address space into different physical spaces, and store different types of objects in them, like for cities, units.

For example, we can pre-allocate an area for 1024 entities. And to access the object, add a layer that will remap this area for the pool of physical pages under the entity numbers id/1024, and inside take the entity by id%1024

Sounds like a good solution and technically this can be done in the same or a new process.

And get rid of most of the memory reallocations (Many std::Map will need to be rewritten into a flat structure).

This should be done regardless of all the other issues because std::map is dead slow. It's faster to loop through a vector than looking up in a map using our current implementation. Also I plan on eventually rewriting EnumMap (again) to be only specialized versions instead of the rather complex template class inheritance system it currently is. Doing so will reduce the memory usage to just a pointer, so less memory when not in use. However I also have a plan for a memory manager where EnumMap requests memory from and sends it to when done. This way if we say use EnumMap<YieldTypes, int> in a loop, whenever it goes out of scope, the memory will be stored as a pointer on a stack and next time an EnumMap happens to request something of the same size, like EnumMap<YieldTypes, int> during the next iteration, it will return the same pointer and reuse the memory without spending time on freeing and allocating.