Switching to custom DDraw texture rendering

WizzardMaker commented 4 years ago

@nyfrk I think we should go with the solution you recommended in #4 and hack the game's calls to the DirectDrawSurface to get a better image quality.

Some game images are hardcoded and very hard to upscale in a native way so that the game can understand them later on. This would also allow us to change the main menu graphics, which was a limitation before.

The 4 byte identifier shouldn't be a problem. The only difficult bit would be redirecting the calls to load our own image and we would also need to be able to compress our images, as they can get rather numerous and large in disk space. [72.8MB for 20.gfx decompressed]

What would be needed to achieve this? Are we making a wrapper dll for ddraw, or replacing the calls to CreateSurface and LockSurface in the games exe?

nyfrk commented 4 years ago

@nyfrk I think we should go with the solution you recommended in #4 and hack the game's calls to the DirectDrawSurface to get a better image quality.

What would be needed to achieve this? Are we making a wrapper dll for ddraw, or replacing the calls to CreateSurface and LockSurface in the games exe?

I would start simple. Don't bother wrapping DDRAW.dll. I already made an ASI loader that gets you into process space. Just create an ASI Plugin, find the vtable of the IDirectDrawSurface, hook its Blt method (i.e. the 5th entry in the table). In hook procedure get the 4 byte watermark of the source and replace the source image accordingly. The target surfaces should be already RGB24. Just make sure to have DDRAW.dll in your import table (dynamically loading it using LoadLibrary is not good when in loader lock of DllMain). This should allow you to blt images without palette limitations. Another aspect is the team colors. We will probably need every unit in 8 versions each with a different team color. The team colors the game uses are documented here. The team colored sprites have an annoying side effect: we have to read the team palette to determine which version we must draw.

The 4 byte identifier shouldn't be a problem. The only difficult bit would be redirecting the calls to load our own image and we would also need to be able to compress our images, as they can get rather numerous and large in disk space. [72.8MB for 20.gfx decompressed]

I would probably put all bitmaps in a large file and then memory map it. If virtual address space gets sparse I would just keep all files on harddrive and load them every time a frame needs them. When ~10 consecutive frames did not use an image I would unload it.

Some game images are hardcoded and very hard to upscale in a native way so that the game can understand them later on. This would also allow us to change the main menu graphics, which was a limitation before.

Can you elaborate more on this. What exactly happens when you swap the menu images and what would be the desired outcome?

WizzardMaker commented 4 years ago

Seems like the game doesn't really use the IDirectDrawSurface7 Blt (or BltFast/BltBatch) to draw each building to the screen. It looks like it draws everything by itself onto the locked surface each frame, instead of creating surfaces for each building/floor/unit.

DDWRAPPER: Lock Surface Wrapper called! 
DDWRAPPER: Lock Surface Wrapper called! 
DDWRAPPER: Lock Surface Wrapper called! 
DDWRAPPER: Lock Surface Wrapper called! 
DDWRAPPER: Blit to l:15 t:8 r:255 b:168, from l:0 t:0 r:240 b:160 
DDWRAPPER: Blit to l:15 t:8 r:255 b:168, from l:0 t:0 r:240 b:160 
DDWRAPPER: Blit to l:0 t:0 r:281 b:210, from l:0 t:0 r:281 b:210 
DDWRAPPER: Blit to l:209 t:928 r:984 b:1026, from l:0 t:0 r:775 b:98 
DDWRAPPER: Blit to l:0 t:250 r:210 b:600, from l:0 t:0 r:210 b:350 
DDWRAPPER: Blit to l:0 t:600 r:210 b:768, from l:0 t:0 r:210 b:168 
DDWRAPPER: Blit to l:0 t:768 r:210 b:1024, from l:0 t:0 r:210 b:256 
DDWRAPPER: Blit to l:0 t:210 r:210 b:250, from l:0 t:0 r:210 b:40

This can be best seen while in the main menu. It Blts a surface the size of the screen and as soon something changes (like the mouse hovering over a button) it locks the surface..

DDWRAPPER: Blit to l:0 t:0 r:1309 b:1027, from l:17 t:0 r:782 b:600 
DDWRAPPER: Blit to l:0 t:0 r:1309 b:1027, from l:17 t:0 r:782 b:600 
DDWRAPPER: Blit to l:0 t:0 r:1309 b:1027, from l:17 t:0 r:782 b:600 
DDWRAPPER: Blit to l:0 t:0 r:1309 b:1027, from l:17 t:0 r:782 b:600 
DDWRAPPER: Lock Wrapper called! 
DDWRAPPER: Lock Wrapper called! 
DDWRAPPER: Lock Wrapper called! 
DDWRAPPER: Blit to l:0 t:0 r:1309 b:1027, from l:17 t:0 r:782 b:600 
DDWRAPPER: Lock Wrapper called! 
DDWRAPPER: Lock Wrapper called! 
DDWRAPPER: Blit to l:0 t:0 r:1309 b:1027, from l:17 t:0 r:782 b:600

I checked the Clipper, attached surfaces, overlay surfaces, nothing..

nyfrk commented 4 years ago

The game uses many GDI functions. Can you check BitBlit? I think GDI is also used for drawing the panels. Maybe its the same for the buildings.

It looks like it draws everything by itself onto the locked surface each frame, instead of creating surfaces for each building/floor/unit.

The game does not create a surface for each object.

The game uses about 13 layers / surfaces. Each surface is refreshed at different rates. One surface holds the mini map, another one the menus panels, one the terrain/background and one is for the objects, buildings etc. When the game wants to render a frame it just puts all these layers on top of each other.

WizzardMaker commented 4 years ago

How can I patch GDI functions, without creating a proxy DLL?

nyfrk commented 4 years ago

Create a normal DLL/ASI project in visual studio. Follow this example. Make sure to have GDI.dll in your imports table.

In dllmain just write a near jmp at the beginning of BitBlt. Just replace the first two bytes for this. Let it jump 5 bytes in front of the function (jmp BitBlt-5) (there is a 2 byte near jump instruction for this, 0xEB 0xF9). Create a far jmp (0xE9) instruction 5 bytes in front of the functions entry and let it jump to your hook procedure. Microsoft named this procedure "hotpatching" and it is well defined for WinAPI functions. Microsoft always has a two byte nop at the beginning of each WinAPI function that you can overwrite. Many libraries can set this hook for you.

For the hook procedure create a function with the same calling convention and arguments as BitBlt. At the end of your hook procedure call the original BitBlt function and offset it by 2 bytes (call BitBlt+2). Make sure to return its return value.

It should look like this:

BitBlt-5  jmp BitBltHook
BitBlt+0  jmp BitBlt-5
BitBlt+2  ...

BOOL BitBltHook(...) {
    // do your stuff here
    auto OrigBitBlt = (BOOL (* __stdcall BitBlt)(...)) (BitBlt+2)
    return OrigBitBlt(...)
}

Many hooking libraries already do this kind of stuff for you. You can also be more intrusive by writing your 5 byte patch directly at BitBlt (thus overwriting the frame setup). If you do this, make sure to repair the frame in your hook procedure.

WizzardMaker commented 4 years ago

Nope, it's not BitBlt, StretchBlt, PlgBlt or MaskBlt. BitBlt get's called like 10 times in the initialization of the engine, but then never again.

There are a lot of SelectObject calls and a few CreateCompatibleBitmap calls (although no CreateBitmap)

I'm slowly going through every function that would eventually need to call SelectObject..

nyfrk commented 4 years ago

Then I am out of ideas. Maybe BB has used an engine that does the blitting manually. I never looked any deeper than the frame composing code that overlays all the ui layers. But i am pretty sure that they were using a windows function since i once stepped over each of those layers to visualize what parts of the frame they contain. Maybe i still have some notes somewhere.

Do you still know the address of the vtable you hooked when hooking Blt? Maybe you got a different interface than the game requested and thus your hook was never called simply because of different interfaces.

WizzardMaker commented 4 years ago

I found it! It creates the images with CreateRectRgn! At least it calls the function everytime something happens

WizzardMaker commented 4 years ago

Do you still know the address of the vtable you hooked when hooking Blt? Maybe you got a different interface than the game requested and thus your hook was never called simply because of different interfaces.

Well, I use minhook to hook the function calls, and they definitely get called, some more than others

nyfrk commented 4 years ago

Well, I use minhook to hook the function calls, and they definitely get called, some more than others

Ah okay. Then the game uses IDirectDrawSurface7::Blt only to draw the intermediate layers only. Good catch!

I just tested a bit and could not find where the game calls CreateRectRgn. It looks like it is only called by gdi itself when drawing a text. Never for sprites or other content. The game uses DrawTextA to draw the texts in the panels.

WizzardMaker commented 4 years ago

Yeah,it uses user32.dll and win32u.dll to draw the information! SetDIBits is being called from user32, which in turn is being called from win32u

The exe imports a lot of functions from USER32, but none from WIN32. The most interesting imported functions are: LoadBitmapA and LoadImageA

WizzardMaker commented 4 years ago

Okay. Seems like they extended the IDirectDrawSurface7 interface.

The vtable function call to v4[1] + 100 goes directly to the IDirectDrawSurface7->Lock function (or rather my patched version)

Also, this function gets apparently called in the main draw function. (There are mentions of "GFX ENGINE: Can't render objects without having a world!" and "GFX ENGINE: DATA ERROR: Illegal value in iDirection", so thats where my conclusion comes from)

nyfrk commented 4 years ago

Yeah,it uses user32.dll and win32u.dll to draw the information! SetDIBits is being called from user32, which in turn is being called from win32u

Sorry i cannot follow along. GDI32.SetDIBits calls gdi32full.SetDIBits. The latter is an internal library. There is no SetDIBits in User32.dll. win32u.dll is an internal library. I cannot imagine that the game uses win32u.dll directly. If it would, it would dynamically link to it. I dont see a reason why BlueByte should have done this. It is probably rather imported by one of the system libraries. Furthermore the game imports GDI32.GetDIBits and not GDI32.SetDIBits. Thus SetDIBits is probably called only by functions from one of the libraries and not the game itself. LoadBitmapA causes a call to SetDIBits.

The exe imports a lot of functions from USER32, but none from WIN32.

The game uses User32.LoadBitmapA (which redirects to User32.LoadImage) to load images from the resource section. To be precise, it loads the cursor from the resource section. LoadBitmapA does not allow to load images from memory or from the gfx archives. There are only 2 images in the resource section: and .

The game uses a streamer to dynamically load and cache images that are needed to render. It unloads them when they are not on screen anymore (with some exceptions and a little delay). It loads the images directly from the gfx archives (hard drive) during gameplay.

Okay. Seems like they extended the IDirectDrawSurface7 interface.

Thats impossible ;)

You are right! Looks like they replaced the IDirectDrawSurface7::Blt function with one that reads stuff from the gfx archives (or well, they rather hacked the internals of DDRAW.dll with their overlay.dll). They also cache the images so that there is not too many read operations from the hard drive. So basically the DDRAW.dll is not a real DDRAW.dll that was written/released by Microsoft \^\^

I always wondered why their Blt method has one argument more than usual haha. Guess that explains it. In that case have a look at the opcode: S4_Main.exe+266BC3. That Blt method is sometimes magically causing read operations to the hard drive ;)

WizzardMaker commented 4 years ago

I think we can get some pretty accurate function informations, when we take a look at the GfxEngine.dll of the editor, as that exports a lot of gfx related functions, like BlitFrameToDib, CreateGuiSurface, GetPlayerColor or RenderObject/RenderResource.

I don't think, that they changed the functions that much, so we could just byte search the game exe after those functions in the dll. We really only need RenderObject/RenderResource and we need to find where the ground textures are rendered

nyfrk commented 4 years ago

So have you made any progress? I realized that i already found the render function when creating the unlimited selection mod. We should start creating a wiki or something so we stop reversing things twice \^\^ Anyway here are the render functions for the settlers 4 that may help us (md5 of my S4_Main.exe is C13883CBD796C614365AB2D670EAD561. Let me know if your version differs from mine, i will then send you aob patterns):

S4_Main.exe+261B90    Render a settler (including selection markers)
S4_Main.exe+263110    Render a border stone
S4_Main.exe+262E80    Render an object (like tree, stones, chickens, geologist signs, good piles (except those attached to buildings) etc)
S4_Main.exe+262090    Render a building. This blits many times e.g. for flags on buildings, settlers working in building, doors on towers, piles on buildings, building effects like rotating rotor of the mill or the anvil of the smiths, selection markers etc
S4_Main.exe+263800    Render that colored bubbles when placing a building or changing the work area of a building.
S4_Main.exe+261FA0    Render the waves on the water.
S4_Main.exe+2631F0    Render a vehicle (ships, war machines etc).

The game is actually coded in a really modular way. I don't see any reason why we could not add and render our completely own units (like spearmen). Note that the game renders on an intermediate surface. It will be blitted onto the backbuffer by the function we already found above. I think i would not follow the "watermarking" approach anymore. Lets just mod the game to use the unpacked gfx files altogether. We could even do alpha blitting. The blitting function the game uses is at S4_Main.exe+25F980. It is does palette mixing before blitting (for the team colors). It is a fastcall (with 9 arguments, the first two passed by ecx and edx, the rest on stack). Caller cleans the stack. The team color palette mixing is probably the reason BlueByte created a custom blitting function.

WizzardMaker commented 4 years ago

Lets just mod the game to use the unpacked gfx files altogether

Yeah, that was the plan with this approach. Have every extracted png of a gfx packed together in a custom gfx file and just load these. This would make it much easier to upscale the images, without any qualityloss due to engine restrictions. This could also enable us to add real soft shadows to objects, instead of that black pattern they used, like you said.

WizzardMaker commented 4 years ago

Should we hook each "Render..." function, or should we just hook their blit method at S4_Main.exe+25F980 and blit ourself to the backbuffer, if we recognize the texture to be drawn?

WizzardMaker commented 4 years ago

I'm currently facing problems with hooking into the blit function at S4_Main.exe+25F980 and calling the trampoline function. The new function gets called, but the esp run time check gets caught and the game crashes.

Here is my typedef of the blit function I got from Ghidra and my hook + orig function call:

typedef ULONGLONG(__fastcall origBlitToBackbuffer)(int EAX, byte* pEDX, int param_1_00, int param_2_00, int param_5, UINT param_6, int param_7, int param_8, int param_9);

origBlitToBackbuffer *oRO;
ULONGLONG __fastcall BlitToBackbuffer(int EAX, byte* pEDX, int param_1_00, int param_2_00, int param_5, UINT param_6, int param_7, int param_8, int param_9)
{
    return oRO(EAX, pEDX ,param_1_00, param_2_00, param_5, param_6, param_7, param_8, 0);
}

nyfrk commented 4 years ago

Should we hook each "Render..."

All the render functions call the Blt function multiple times (eg. towers have doors that are painted over the building). Since we have to draw each of these too, we would have to completely reimplement the Render functions. Imho, thats too much work (but not impossible). I would rather reimplement the Blt function. I would do stack hacking in the hook procedure to extract the exact sprite that we have to draw from the stack. All other stuff (like position on screen etc) is already in the arguments of the Blt function.

Here is my typedef of the blit function I got from Ghidra and my hook + orig function call:

Don't trust ghidra blindly. It is flawed when it comes to "fairly" recent compiler quirks.

fastcall (as specified for the Microsoft compiler) expects the function to do the stack cleanup (similar to stdcall). However the function you are trying to hook is a "compiler invented calling convention" (Raymond Chen has an article about that). That means the compiler created a new calling convention for optimization purposes. It is almost a fastcall but this one expects the caller to do the stack cleanup. If you implement it like you did above, it will crash since now the caller AND your hook proc will clean the stack. I am not sure if many libraries can handle this for you. You will likely have to use inline assembler for this as i am not aware of any possibility to declare new custom calling conventions.

WizzardMaker commented 4 years ago

Okay, just turns out, that minhook is absolute trash and it completely destroys the instructions of the function with that inserted jmp. Gonna switch to polyhook2 and hope it gets better.

expects the caller to do the stack cleanup

Wouldn't that mean I would need to make the new method __declspec(naked) and just return with something in EDX and EAX? I don't even know what the original blit really returns to the render functions to be honest ^^

nyfrk commented 4 years ago

Okay, just turns out, that minhook is absolute trash and it completely destroys the instructions of the function with that inserted jmp. Gonna switch to polyhook2 and hope it gets better.

Well that is kind of expected. As long as it repairs the replaced instructions (by adding them to the hook procs prolog) its fine. After all this function is not compiled with /hotpatch and thus has no 2-byte nop at the beginning.

Wouldn't that mean I would need to make the new method __declspec(naked) and just return with something in EDX and EAX?

I would write a naked function that converts it to stdcall. Here is an example. and here a more complex one. This also allows you to push the return address as an additional argument (useful for stack hacking later). And it allows you to repair the instructions you replaced.

I don't even know what the original blit really returns to the render functions to be honest ^^

Its just a VOID function (return value is ignored by the game).

WizzardMaker commented 4 years ago

As long as it repairs the replaced instructions (by adding them to the hook procs prolog) its fine

Sadly that isn't the case. An innocent looking "mov eax, dword ptr[]", gets obliterated to an "or" opcode

WizzardMaker commented 4 years ago

Well, I think I am at my limit regarding C++ and hooking into functions. It either crashes, because the stack gets corrupted or it outright fails to hook...

And the fact, that this would be a normal S4ModApi plugin suggests, that it would be the best, if you could maybe supply a callback to the blit function in the API itself, where one could decide whether the currently scheduled render job should be handled by the the callback, or the original blit function.

If that doesn't bother you @nyfrk of course

nyfrk commented 4 years ago

Yes I can do that.

Edit: See here: https://github.com/nyfrk/S4ModApi/releases/tag/v0.5

You can now use the AddBltListener function. I added a caller parameter that allows you to see from where the Blt function was called (just in case it may be useful). Whenever you return a non-zero value the original Blt function is skipped (thus preventing the game to draw the current sprite).

Here is an example that skips the drawing whenever the 'U' key is down.

s4->AddBltListener([](DWORD _0, DWORD _1, DWORD _2, DWORD _3, DWORD _4, DWORD _5, DWORD _6, DWORD _7, DWORD _8, BOOL discard, DWORD caller)->BOOL {
     return GetAsyncKeyState('U') < 0;
});

If you are interested: That is the code that implements it. I observed that we must preserve the XMM registers so i saved them onto the stack too. I also did that for the FPU just to be safe. But I haven't tested yet what XMM register or whether the FPU must be saved. I will do that later. Let me know if you need access to the XMM registers as they probably contain useful information. I will then think about a solution.

WizzardMaker commented 4 years ago

Great! That works flawlessly!

Now slowly finding out, what each argument does.. currently 5 out of 9

arg0 - ? arg1 - the data buffer of the image, like its in the .gfx file arg2 - probably the id of the object. returning when its under 100 removes stuff like water ripples, units and grass, above 100 removes ships and buildings arg3 - ? arg4 - x position to draw on the surface arg5 - y position to draw on the surface arg6 - ? arg7 - the color buffer of the backbuffer arg8 is maybe something with the minimap, but I'm not sure

Here is a simple plugin, that blocks all rendering, when the x value is below 500 or the id is below 100 S4Patch

WizzardMaker commented 4 years ago

Do you know the color format of the backbuffer arg7? And what is the size, screen width screen height bytes per pixel, or is there something with the zoom level that we would have to consider?

And oh boy do I not like C++.. I think I am just gonna call a C# library with all that information xD. I mean, type safety and a nice library system, what is there not to like, and it only adds one more .dll to all the other ^^

WizzardMaker commented 4 years ago

And a few notes for me for later:

Needed magic number for identification of upscaled files:

buffer [0] - byte 0xF0
buffer [1] - byte 0x0D

even though the first byte is nearly always a "0", we make sure to make our identification number unique

then the gfx id:

buffer [2] ** - byte 0x02 - 0xFF - though the max is only 0x29 (41)

then the id of the file inside the gfx:

buffer [3] ** - byte 0x02 - 0xFF
buffer [4] ** - byte 0x02 - 0xFF

Remove the offset and then combine to an int16. We need 16 bit because some .gfx files contain up to 19,000 images, but no more than the max of int16 (65,535).

then the rest of the file, which will be ignored

** - these are offset by 2, to maintain a bit of compatibility with the original renderer, in case of api errors, so that the image could still be displayed.

WizzardMaker commented 4 years ago

Do you know the color format of the backbuffer arg7? And what is the size, screen width screen height bytes per pixel, or is there something with the zoom level that we would have to consider?

Or we could just create a new surface and draw to that instead of using theirs.

That would be a better solution in regards to making the HD Patch. To make the images truly HD, we would have upscale the ground buffer and draw our objects with the new HD resolution to the new surface and blit that to the window

nyfrk commented 4 years ago

arg0 - ? arg1 - the data buffer of the image, like its in the .gfx file arg2 - probably the id of the object. returning when its under 100 removes stuff like water ripples, units and grass, above 100 removes ships and buildings arg3 - ? arg4 - x position to draw on the surface arg5 - y position to draw on the surface arg6 - ? arg7 - the color buffer of the backbuffer arg8 is maybe something with the minimap, but I'm not sure

Awesome! I will adjust the types of the callback accordingly.

arg0 is probably a pointer to a palette. It is basically a word array. A color in the palette is 16 bit (very likely RGB565). A palette contains 256 colors and may thus be indexed by bytes. Looks like there is some special colors 0 and 1, presumable for transparency and other special purposes. When a byte in the GFX is 1, the next word specifies some mixing argument (probably a reference to the shadow mask in the gfx file). It can be used to blit solid black (hardcoded) using a checkerboard pattern. The Blitting function also allows for "fogging" in various steps. For that purpose the blitting function seems to picks offsetted colors from the color table. Thus the color palette is probably organized in a certain way to allow this to have a "blend to dark" effect. There are 7 fogging steps. The surface it blits onto is presumably RGB565 (16 bit colors) which is not good. We will have to convert it to RGB24 or RGBX32.

arg3 is of the same enum as arg2. It is used to calculate the height of the sprite to draw. The gfx archive is probably layed out in a certain way that makes this necessary. The width seems to be determined from arg2. The height is necessary to determine whether an object collides with the viewport when its y screen coordinate is outside of the viewport.

arg6 is compared against the y coordinate of the destination on the screen and changes the behavior somehow. I am not yet sure what the purpose of this is. Maybe it is something related with the terrain height?

arg8 is just used to determine the dimension of the sprite to blit. The game probably encodes the width/height probably not aligned to save space and this argument is used to rotate accordingly. I have never observed it to be any other value than 0 so it is probably not used.

Do you know the color format of the backbuffer arg7?

Very likely it is RGB565. At least the original Blt function seems to write only words. So no more than 16 bits per pixel.

And what is the size, screen width screen height bytes per pixel, or is there something with the zoom level that we would have to consider?

2 bytes per pixel. Screenheight and width (rather surface height and width) is available at S4_Main.exe+1691EC and S4_Main.exe+1691F0. I will probably add them to the argument list of the callback. The zoom level must be considered. I am not yet sure how the game handles that. Looks like we must completely reverse their Blt function to understand how it works in detail. The annoying part is that there are a few global variables that are used within the original Blt function. I think one of them could control the zoom level. It appears that they just copy the last pixel for stretched blitting. There seems to be no interpolation.

Edit: I think we could add the width and height of the sprite as an arguments to the callback. That would greatly help you to handle the different zoom levels of objects.

And oh boy do I not like C++.. I think I am just gonna call a C# library with all that information xD. I mean, type safety and a nice library system, what is there not to like, and it only adds one more .dll to all the other ^^

Sure that is possible. After all, the interface of the S4ModApi is just a COM Object. I am pretty sure that Microsoft has ways to marshal them directly into C# so that you do not need a wrapper DLL for the S4ModApi. Type safety is usually a thing in C++ but well, we haven't reversed the types yet. I will properly type them in the future (instead of just using DWORD for each argument).

Edit: I am not sure if it is a good idea to implement a blitting function in C# as this is performance code. Such tasks are usually better implemented in C. Don't underestimate the impact of copying the pixels of several 100 objects onto the screen 60 times a second.

Needed magic number for identification of upscaled files:

I would add the magic number with a checksum. So add 4 bytes, the first two are an identifier that identifies the sprite, the third and fourth are just the identifier xor'ed with a constant key you choose. So when you xor the identifier and the checksum, the result should be your constant key. If that is not the case let the game blt, otherwise do your own blitting. I used a similar technique when hooking the sockets to send additional packets for an automatic map exchange upon joining a lobby. You can change the key until you find one with no collisions (otherwise use a larger key / checksum).

That would be a better solution in regards to making the HD Patch. To make the images truly HD, we would have upscale the ground buffer and draw our objects with the new HD resolution to the new surface and blit that to the window

No, I think the buffer contains as many pixels as windows uses for the client area of the window. To make it HD, we must just use a proper scaling solution when blitting. Usually one would use mip maps for this.

WizzardMaker commented 4 years ago

We could already draw our images, when we get the zoom level, as that defines, how to scale the image. We also need to find out, how the game does its z sorting

I agree on the performance penalty in C#, we really need every ms we can get.

We could just render to a custom buffer, instead of fiddling with the surface, or how the orig. function draws to that. That buffer would be drawn everytime, when the game blits to the screen with the original IDirectDrawSurface

Here is an example, where I draw a rectangle at the position of the skipped textures:

Btw. arg4 and arg5, the x and y positions, are signed integers

nyfrk commented 4 years ago

We could already draw our images, when we get the zoom level, as that defines, how to scale the image.

The original Blt function does determine how to scale the image. I think I can work that out and add a destination rect (and a source rect) to the arguments of the callback. You can use it to calculate a scale but I would not start using floating point operations just for that.

We also need to find out, how the game does its z sorting

I am not sure. The game should execute the Blt function already in the correct order.

I agree on the performance penalty in C#, we really need every ms we can get.

We could just render to a custom buffer, instead of fiddling with the surface, or how the orig. function draws to that. That buffer would be drawn everytime, when the game blits to the screen with the original IDirectDrawSurface

I thought about that. Having a second surface to draw on would mean that we must draw all objects onto that new surface (we cannot have some on the original surface and some on the new one). We cannot blend the old and new surfaces later since it will break the z order of objects.

Converting the games Blt function to make it render to a RGBA32 surface would probably not be too difficult. I don't think there are more functions that render on that surface. The final blitting to the client area should not need any changes since it uses the Blt method of the DirectDrawSurface and that should be able to handle RGBA32 sources (at least we can make it work, since i know that AlphaBlending is possible). The advantage of this is that your mod would not break other mods that for example add new units, tribes or buildings to the game (e.g. if someone decides to make a plugin that adds the tribes from settlers 3 to the game).

I think it would be a good idea to add a Blt function to the S4ModApi. That Blt function must be able to process RGBA32 or palettized RGB565 images. This way other mods can extend the RenderObject method without making it incompatible with your RGBA32 mod. They could coexist or it would even be possible for you to provide RGBA32 sprites of the additional tribe too since it would just be another blit observed by your listener. However if you don't provide RGBA32 sprites for the additional units everything would still work fine.

Having a Blt function in S4ModApi would allow us to make all the necessary implementations that are desired anyway when blitting object sprites. Like fog (objects becoming darker), team colors or that "growing building" animation when erecting a building.

So basically I suggest that we add the following functions to the S4ModApi

Blt, that allows us to blt onto the surface with optional fog, optional team color or/and optional building animation. That one can handle RGBA32 or the original palettized RGB565 images.
EnableTrueColor, that enables RGBA32 blitting by adding a second surface and patching the Blt function to produce RGBA32 colors.

WizzardMaker commented 4 years ago

I thought about that. Having a second surface to draw on would mean that we must draw all objects onto that new surface (we cannot have some on the original surface and some on the new one). We cannot blend the old and new surfaces later since it will break the z order of objects.

Thing is, the blit function has the gfx data and palette, so its rather easy to just create that texture ourself, with the exporter methods. So we can just draw all of the images to the surface, both our new ones marked with the id in the beginning and the original images

Do you know how the black view range/radius is drawn? Is it just another surface, that gets drawn on top of all objects?

nyfrk commented 4 years ago

Thing is, the blit function has the gfx data and palette, so its rather easy to just create that texture ourself, with the exporter methods. So we can just draw all of the images to the surface, both our new ones marked with the id in the beginning and the original images

I would not cut that down. My suggestion is that, instead of you doing the blitting yourself in your plugin, we would create a Blt method in the S4ModApi. You would still get the data of the original GFX file (and all the other arguments) but you would call s4->BltObjectRGBA(rgbaImage,...) to get it drawn onto the games surface. If the library does that, we can ensure compatibility with other mods that add graphics to the game that do not yet exist. The S4ModApi would basically hide the fact that it swaps the games surface to a RGBA32 surface (thus allow for some abstraction).

The question is now: How would the argument list of the s4->BltObjectRGBA look like? How would we want to add team colors or the different fogged versions of the sprites? I guess that we don't want to create 8 versions of a settler and for each of these versions 7 versions of darkened versions. So we must somehow mask them or allow for passing a lambda function that does the darkening. Then the next question is how would we want to customize the "growing building" animation? That is handled by the original Blt function too.

Do you know how the black view range/radius is drawn? Is it just another surface, that gets drawn on top of all objects?

I assume that the palettes are layed out in a way that allow to darken the colors when decrementing the color. The game just decrements each pixel by a certain amount to draw a darker image. When you blit your RGBA images, you would probably decrement each channel of each pixel by a certain amount to achieve a similar effect.

WizzardMaker commented 4 years ago

You would still get the data of the original GFX file (and all the other arguments) but you would call s4- >BltObjectRGBA(rgbaImage,...) to get it drawn onto the games surface.

But how would you achieve z ordering?

The game should execute the Blt function already in the correct order.

Like you said. The game executes these commands in a specific order.

How would we want to add team colors or the different fogged versions of the sprites?

We could use a two layer method. Have a unit texture without the team coloring and a second one with just the team coloring which will be tinted to the correct team.

When you blit your RGBA images, you would probably decrement each channel of each pixel by a certain amount to achieve a similar effect

The question was rather, how does the game know, when to tint the pixel dark? There has to be some sort of array storing that information.

nyfrk commented 4 years ago

But how would you achieve z ordering?

The plugins would still have to register a Blt listener and call the custom blt method from there thus ensuring the correct order. The plugin code could look something like this:

s4->EnableTrueColor(); // enable true color mode for object surface
s4->AddBltListener([](DWORD _0, DWORD _1, DWORD _2, DWORD _3, DWORD _4, DWORD _5, DWORD _6, DWORD _7, DWORD _8, BOOL discard, DWORD caller)->BOOL {
   WORD* gfx = (WORD *)_1;
   if (gfx[0] ^ gfx[1] == 0x1337) { // check if gfx is watermarked
       auto filename = "image" + gfx[0] + ".bmp";
       auto pixels = LoadARGBFromFile( filename );
       s4->BltObjectRGBA( pixels, ... ); // blit using RGBA image
       return 1; // skip original blitting
  } else {
       return 0; // non-watermarked sprites are drawn by the game
  }
});

The arguments of the callback will be replaced by a single argument that gives a pointer to a struct that contains all the current arguments. This way plugins can alter the arguments and as a bonus the loop can be more efficient since we do not have to repush all arguments onto the stack each time we iterate the observers.

We could use a two layer method. Have a unit texture without the team coloring and a second one with just the team coloring which will be tinted to the correct team.

So essentially you would use some kind of mask and change the hue similar to how one would do it in Photoshop when using a mask to select the area to work with. I am not sure how good this can be quality-wise. It would probably be easier if we would just use the S4GFX tool to export 8 versions of each sprite (one per team color). This would allow for maximum of customization and be probably easier for us (at the expense of memory usage of course).

Edit: Quality problems can be solved when using a palettized mask image and then providing 8 different palettes.

The question was rather, how does the game know, when to tint the pixel dark? There has to be some sort of array storing that information.

This information is stored in the terrain array. It is an array of DWORDs. The 1st byte is the terrain type (grass, sand, etc), the 2nd is the height and the 4th is related to the fow (I think the fow level was the 3 least significant bits of this 4th byte). I think the fow of the currently blitted world position is already in one of the static variables. So it shouldn't be too hard to figure that out.

WizzardMaker commented 4 years ago

Problem is with adding new units currently is, how would you add them? There is no really easy way of adding any new units to the .gfx files, as the direction and job index list system is rather complicated. So there won't be any way of hacking the first few bytes of the image to point to our external .png file.

The only way I could see is to hack existing units to facilitate custom units and then somehow identify them later in the blit function.

So essentially you would use some kind of mask and change the hue similar to how one would do it in Photoshop when using a mask to select the area to work with. I am not sure how good this can be quality-wise. It would probably be easier if we would just use the S4GFX tool to export 8 versions of each sprite (one per team color). This would allow for maximum of customization and be probably easier for us (at the expense of memory usage of course).

Yeah, thats what I meant. But I think it would be easier to just have 8 versions of the units, though that would 8 fold our space consumption. I'd need to check how much more space is really consumed in the end, when we're packing everything

nyfrk commented 4 years ago

Problem is with adding new units currently is, how would you add them?

That is not a problem the HD patch must solve but I would add new objects by expanding the switch in the RenderObject function of the game. The game will try to blit the first image in the gfx archive which is usually a black placeholder if an identifier is unknown so we just hook that and instead draw a the correct image. For the logic it would be just another instance of the CSettler or CBuilding class in the object pool.

Yeah, thats what I meant. But I think it would be easier to just have 8 versions of the units, though that would 8 fold our space consumption. I'd need to check how much more space is really consumed in the end, when we're packing everything.

To reduce memory usage we could create a blitting function that directly blits from compressed images. After all we have to write our own blitter anyway so this would probably be a reasonable choice.

Could you add an export mode that allows us to output for each sprite 8 colored versions as premultiplied RGBA32 bitmaps with 3 times the resolution of the original? For images that do not use team colors (like trees, animals etc) there would be just one version of course. Then we could see about how many Gigabytes we are talking. We could multiply this by 4/3 to estimate the cost of mip maps too.

The reason we should choose 3 times the original resolution is that the game allows only scales images up to 3 times. More resoultion would be pointless unless we hack the game to allow more zoom.

Edit: With 3 times the resolution I mean that a 2x2 pixel large sprite should become 6x6 pixels large.

WizzardMaker commented 4 years ago

I'm gonna update the exporter to apply the team colors

3 times the resolution of the original?

Isn't 3x a bit overkill?

2x can already result in big file sizes. you have to consider, that we are talking about 18000 sprites per tribe that x8 x3 is A LOT. The buildings are at 2x resolution saved as png already at 213 MiB.

nyfrk commented 4 years ago

Isn't 3x a bit overkill?

2x can already result in big file sizes. you have to consider, that we are talking about 18000 sprites per tribe that x8 x3 is A LOT. The buildings are at 2x resolution saved as png already at 213 MiB.

Ok, lets try 2x first.

WizzardMaker commented 4 years ago

I've exported and scaled every texture in 20.gfx. With 2x scaling, billinear, we get around 43MiB +- some MiBs for more pixel information, when AI upscaling. So with a full lobby we would see around 350MiB (43*8 assuming all other tribes roughly share the same space requirements) in memory usage just for the units, plus the requirement for the buildings, but that would only be around 860 MiB at max when loading every tribe and every building at once.

Question is, do we just load the whole texture container once when needed, or do we load only the textures on demand (Probably the whole image group containing all the animation textures) which could result in lag, as reading from files is rather slow, even if we cache the textures for a while.

If we go with just loading the whole file, we should probably make the game large address aware, so that we don't run out of x86 memory

nyfrk commented 4 years ago

I've exported and scaled every texture in 20.gfx. With 2x scaling, billinear, we get around 43MiB +- some MiBs for more pixel information, when AI upscaling. So with a full lobby we would see around 350MiB (43*8 assuming all other tribes roughly share the same space requirements) in memory usage just for the units, plus the requirement for the buildings, but that would only be around 860 MiB at max when loading every tribe and every building at once.

Is this using RGBA32 bmp's or png's? png's are compressed and must be uncompressed when blitting.

Furthermore we should consider whether to use mip maps. Otherwise we could get ugly effects due to the scaled blitting when zooming out.

Question is, do we just load the whole texture container once when needed, or do we load only the textures on demand (Probably the whole image group containing all the animation textures) which could result in lag, as reading from files is rather slow, even if we cache the textures for a while.

Memory mapping the container would probably be the easiest and fastest solution. At least that lets windows handle all the page swapping for us.

If we go with just loading the whole file, we should probably make the game large address aware, so that we don't run out of x86 memory

True, but we cannot set the laa flag for DLLs. (Well we can, but it has no effect since the flag on the main executable determines whether the entire process is laa). So we would have to set the flag for the S4_Main.exe. I am not sure how well the game operates with negative memory pointers. I think the game sometimes uses the sign bit to check whether a pointer is valid. So this must be carefully tested.

WizzardMaker commented 4 years ago

Is this using RGBA32 bmp's or png's? png's are compressed and must be uncompressed when blitting.

Using pngs. But the compression isn't that high, as they are rather small files to begin with. We could save a few bytes if we save the files as raw color streams, but with run length encoding. RLE is rather fast to decode and can save a bit of space.

Furthermore we should consider whether to use mip maps. Otherwise we could get ugly effects due to the scaled blitting when zooming out.

I don't really know of mip maps are that necessary, if we use a scaling algorithm like bilinear or trillinear, when blitting. If we use a custom surface, we could just use hardware accelerated direct2d, which gives us these options when blitting

nyfrk commented 4 years ago

Using pngs. But the compression isn't that high, as they are rather small files to begin with. We could save a few bytes if we save the files as raw color streams, but with run length encoding. RLE is rather fast to decode and can save a bit of space.

Lets start simple. I propose that we just concatenate all PNGs into one big file. Then we will using libpng to load and blit them onto the surface and hope that it will be fast enough. Converting them as RGBA32 bitmaps will definitely blow our memory usage.

I don't really know of mip maps are that necessary, if we use a scaling algorithm like bilinear or trillinear, when blitting. If we use a custom surface, we could just use hardware accelerated direct2d, which gives us these options when blitting

I am not sure about that. Hardware accelerated blitting will probably be difficult since we need extra features like "growing buildings" or fog of war fading. I would create a software solution first (similar to the original function).

WizzardMaker commented 4 years ago

How is the progress on the renderer @nyfrk?

nyfrk commented 4 years ago

I am sorry. I currently do not have much free time. I will continue working after the 5th of november on open source projects.

nyfrk commented 4 years ago

I did some work today.

I ditched the idea of using libpng and rather used Gdiplus. Gdiplus comes with rgb565 support by default, does support more formats (png, jpeg, etc) and provides nice blitting features (like polyline clipping for the erecting building animation).

The width and height of an image to render is determined by a table lookup. The game manages a table that is modified whenever the player changes the zoom. It maps widths/heights of a gfx to a width/height to render. I will add the table to the API too. If you don't want to wait for the API, you can use INT32* GfxZoomTable = *(INT32**)(S4_Main + 0x10587A8); for now.

Looks like some graphics are already rendered in high resolution. When fully zoomed in, thin objects like grass are displayed almost 1:1 (high resolution). Whereas the chicken is upscaled (low resolution).

We could use the flag to feed the game higher resolution images. Though that would still limit us to the palette.

One annoyance with hooking the Blt function is that we have no way to determine the fog of war level at the currently rendered object since the fog is already computed beforehand by passing an adjusted (darkened) palette. We cannot make a quick terrain lookup since we do not know the x and y position of the object on the map (at least we dont know it in the hook procedure). Fortunately the game is very optimized and only knows one palette for any fogged object (despite there being multiple fow levels or different objects/team objects). So we can simply check the palette whether it is the fow palette and if it is, we draw the sprite slightly darkened.

Here is a demo that renders colored boxes behind objects. (Later we would remove them and blit real images). Fogged sprites are rendered with a dark red box.

https://youtu.be/Sw5nKBgsZ54

It looks good so far but there are still some issues we must fix:

The erecting building animation is not implemented yet
The tiny camera in the side panel apparently needs a scanline fix
And of course we need to incorporate a rgb888 surface

WizzardMaker commented 4 years ago

That is good news!

I implemented the identification of the gfx files today aswell. It's more experimental, so if you need the feature i'd send you the application to apply it to a gfx file

How does the game handle the building animation? There has to be some kind of info tied to the object, indicating the progress, isn't there?

nyfrk commented 4 years ago

The game uses argument _6 for this. _6 is the current y position of the zigzag clipper.

WizzardMaker commented 4 years ago

Oh

Well, I don't know how we could create that zig zag procedurally in code, but we could just create a mask texture and draw the building texture with the mask applied. That would fix that

nyfrk commented 4 years ago

I think i would first try the region clipping method of Gdiplus.

WizzardMaker / S4GFX

Switching to custom DDraw texture rendering #11