DarkStarSword / 3d-fixes

Stereoscopic 3D fixes using Helix mod & 3DMigoto
105 stars 126 forks source link

Questions about transparency #21

Open SilentNightSound opened 11 months ago

SilentNightSound commented 11 months ago

Hello! Apologies in advance if this isn't the correct place to ask about it, but I have been struggling with getting transparency functional in 3dmigoto for nearly a year now and was wondering if you had any insights.

Broadly speaking, I have had two major issues when using transparency with 3dmigoto:

1) Being able to control the amount and location of transparency on an object 2) Handling how draw call order affects what parts of the object are drawn first/last


For 1), as I understand it there are two main ways to create a custom shader to make an object transparent - you can use blend_factor to make the entire part a uniform transparency, or you can you can use one of the other blend modes to make it dependent on the state of the render target outputs (such as SRC_ALPHA, SRC1_ALPHA, etc).

For the blend_factor method, as far as I know there is no way to set the factor either through a texture or to have it span a range of numbers so you have to "sacrifice" an entire part to be a uniform transparency (which can be a problem in games like genshin where a significant number of models are drawn with only 1-2 parts) and can't do things like gradients.

For the method that uses the render target state, I have found that a large number of games I mod use all of the o0 and o1 channels already (x,y,z,w), so any attempt to also use them for transparency results in it either not working or a clash between the "meaning" of the channel (e.g. it will become transparent but also glow).

The only general method I have found so far that works everywhere is to pass the current render target into the shader and manually blend in hlsl using something like lerp, but that adds a bunch of extra complexity and also loses some of the benefits of doing it in OM as well.

So, I was wondering if either a) there is a way to set the blend_factor dynamically based on something like a texture or equation or b) if it is possible to using things like SRC_ALPHA even when all the channels are full (maybe by moving the channel? as far as I can tell from the docs only o0 and o1 are supported for blend). Or perhaps c), an easier way that I might be overlooking?


For 2), the issue lies in the order parts are drawn. First, I am running under the assumption that 3dmigoto is unable to change the order objects are drawn in the frame, or delay certain calls until others have completed (I haven't seen anything in the code or all my searching in the documentation/forums, though I may have missed it). This means we are "stuck" with the order the game decides to draw objects in, which becomes an issue when trying to implement transparency.

The key issue I am running into is that because I do not have control of the order things are drawn in, there is no guarantee that a transparent object will be drawn after an opaque object that lies behind it. So when it comes time to "blend" the transparent object with the background, the background is all-black and it becomes transparent to the wrong thing (e.g. a character's clothing being drawn before their base model so you "see-through" the model to the background, or the character being drawn before the scenery behind them is drawn).

In some cases it is possible to shift around the transparent part to another portion of the model which is drawn after the rest has been drawn, but not always - sometimes different parts of the model have different properties (glow, skin shader, etc), sometimes the draw order changes depending on the scene, sometimes the vertex groups you need are only on a single part, or sometimes the model only has a small number of parts (and as far as I know you can't blend a transparent part with an opaque one on the same call?). And it usually doesn't fix blending with the background at all.

I know there are some methods of draw-order independent transparency, is 3dmigoto able to use/activate any of them?

This is the issue I am struggling most with, and have came up with 3 possible solutions but have ran into difficulties with each of them:

a) Draw a part in two passes - basically, draw the opaque portion first in a PS, then pass that PS output to a second PS which manually blends the transparent part with the opaque one. This would let you have both the opaque and transparent portion on the same draw call and guarantees no draw order issues within that call, though it wouldn't fix order issue with other parts or the background. I have not yet found a way to pass the output from one PS into a second one though - from all my testing, running multiple PS has them all execute simultaneously and not in sequence (e.g. no way to make calculations of some vertices dependent on the results from other ones)

b) Output the opaque and transparent data separately, and then do the blending of the transparent part later in the frame once more things have been drawn (or even 1 frame later with the result of the previous frame, though that would create a 1 frame lag for transparency). This seems to be the most hopeful, but I have been running into issues with adding more render targets (games either ignore the added ones, or the added ones don't clear properly between frames and accumulate junk data). It also isn't a very general method since you need to identify when in the frame to actually do the blending which would vary from game to game (since many games flip the screen at some point when drawing). This method also has issues since you need to find a way to distinguish between objects that are in front or behind the character yourself since you are doing the blend manually and can't rely on the depth data unless you store it somehow.

c) Take all the data (vb/ib/cb/shader/etc) from the draw call and move the entire call later in the frame by creating a new custom shader/call. I haven't found a way to "insert" new calls, and even if we override later calls parts with all the relevant data parts still seem to be missing (e.g. I can't find any way to turn on glow if the original shader had it and the new one does not). This also has the issue that often later frames depend on the output from earlier ones, so by moving the entire frame we potentially deprive later shaders of the information they need to function.

Do you think any of these methods are viable? Or is there a simpler method that I am overlooking?


Apologies for the long wall of text, but any insights you have would be appreciated!

DarkStarSword commented 10 months ago

I haven't properly proof read this yet, and I may have some more thoughts to add later... but I need to get some sleep, so here's what I've got so far:

(such as SRC_ALPHA, SRC1_ALPHA, etc). ... For the method that uses the render target state, I have found that a large number of games I mod use all of the o0 and o1 channels already (x,y,z,w) ... as far as I can tell from the docs only o0 and o1 are supported for blend).

You probably don't want to use any of the SRC1 blend options - those are pretty restrictive since they place special meaning on o0 and o1 and aren't generally useful outside of some pretty specific techniques. To specify the blend state for all 8 render targets independently you would use syntax like the following:

blend[0] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[1] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[2] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[3] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[4] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[5] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[6] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[7] = ADD SRC_ALPHA INV_SRC_ALPHA

If you just use blend = ... without specifying the render target it will pass IndependentBlendEnable=false to DirectX, i.e. all render targets will use the same blend mode.

Also note that if you are trying to modify the blend state for an existing draw call, specifying any blend option will set every other blend option that you didn't specify back to whatever is the default in DirectX. You can alternatively specify blend_state_merge=true to tell 3DMigoto to merge your options with whatever the game is using instead of the defaults, though this comes with a slight additional performance cost at runtime since 3DMigoto can't know the options the game will use in advance and has to recreate the merged blend state every time the CustomShader section is run.

Not sure if you've already seen it, but the 3DMigoto options are essentially exposing the DirectX options pretty much 1 for 1, so relevant documentation for these can be found here, and any DirectX 11 rendering tutorials on the subject may also be of use: https://learn.microsoft.com/en-us/windows/win32/direct3d11/d3d10-graphics-programming-guide-blend-state https://learn.microsoft.com/en-us/windows/win32/api/d3d11/ns-d3d11-d3d11_blend_desc https://learn.microsoft.com/en-us/windows/win32/api/d3d11/ns-d3d11-d3d11_render_target_blend_desc https://learn.microsoft.com/en-us/windows/win32/api/d3d11/ne-d3d11-d3d11_blend_op https://learn.microsoft.com/en-us/windows/win32/api/d3d11/ne-d3d11-d3d11_blend https://learn.microsoft.com/en-us/windows/win32/api/d3d11/nf-d3d11-id3d11devicecontext-omsetblendstate

So, I was wondering if either a) there is a way to set the blend_factor dynamically based on something like a texture or equation

It certainly can't be a texture - blend_factor is passed to OMSetBlendState API and is a single value that will be used across the entire draw call and every render target that uses it in the blend mode.

We could potentially look into allowing it to be evaluated as part of the command list if you think that might be useful - it would still be a single value for the entire draw call, but this would allow it to be changed between draw calls. I think anything that could be done with this could also be done by passing the value into the shader and blending there, and from what you describe I'm not sure if it would help much in GI - but if think you may have a use case where it would be more convenient to do this in the ini file we could certainly look into supporting this.

or b) if it is possible to using things like SRC_ALPHA even when all the channels are full (maybe by moving the channel?

On the one hand the restriction that the w channel is alpha for the purposes of OM blending kind of comes from DirectX, and the meaning of the channels outside of that comes from the game, but 3DMigoto does give you a lot of power here and if you really wanted to move the channel, do the blending, then run a custom shader to fix the channels back up to what the game is expecting before passing it on to the next draw call you probably could.

For 2), the issue lies in the order parts are drawn. First, I am running under the assumption that 3dmigoto is unable to change the order objects are drawn in the frame, or delay certain calls until others have completed (I haven't seen anything in the code or all my searching in the documentation/forums, though I may have missed it). This means we are "stuck" with the order the game decides to draw objects in, which becomes an issue when trying to implement transparency.

Yes, this is correct, and something I would like to find a good way to solve, but it's a difficult problem and I don't have a particularly clear idea of how a solution would work in practice that doesn't have its own set of problems - e.g. say we added some feature to record draw calls of interest from an opaque rendering pass of the game and replay them during a later transparency pass (which would definitely help with some of the issues you have encountered). In the opaque pass objects are often drawn in a pretty arbitrary order and often does not even follow a consistent order between subsequent frames, but in the transparency pass (depending on the technique the game is using) objects may be required to be sorted back to front (or front to back depending on the technique) by their distance to the camera, but 3DMigoto is fundamentally unaware of the distances so cannot meaningfully sort them so would not be able to replay the draw calls in a suitable order - and if the transparent effects we are injecting overlaps with other transparent effects in the game we would really need to interleave the injected draw calls with the other draw calls from the game in order to render it all correctly... which makes a truly correct solution here even harder.

If you have any ideas on how we might be able to solve this I'm open to suggestions. We may end up having to settle for something that is less than perfect, or solutions that are somewhat specific to games using certain transparency techniques (these can't even be engine specific, because e.g. Unity and Unreal both have a number of different render pipelines and options that developers can choose between).

The key issue I am running into is that because I do not have control of the order things are drawn in, there is no guarantee that a transparent object will be drawn after an opaque object that lies behind it. So when it comes time to "blend" the transparent object with the background, the background is all-black and it becomes transparent to the wrong thing (e.g. a character's clothing being drawn before their base model so you "see-through" the model to the background, or the character being drawn before the scenery behind them is drawn).

Right, so ideally we wouldn't be trying to render any transparent geometry during the game's opaque pass at all - ideally we want all opaque geometry to be on the depth buffer before rendering any transparent geometry. I think that's probably about the minimum problem we would need to solve (and I think this part is doable), even if we can't get order between overlapping transparent objects quite right after that.

How much overlapping transparent objects are there in GI (modded and vanilla)? How much would solving just this part of the problem help?

I know there are some methods of draw-order independent transparency, is 3dmigoto able to use/activate any of them?

Aside from purely additive transparently (i.e. blend = ADD ONE ONE) these aren't done at the DirectX API level, but rather e.g. each transparent item would be rendered to a separate render and depth target (or to an array of targets or slices of a 3D target) and a shader used to sort them by depth and blend them together at the end (at least that's my understanding, I haven't personally implemented these techniques so my understanding is limited and may be completely wrong).

Rendering each transparent object to its own render + depth targets could potentially be part of a final solution since the blending shader would be injected at a later point in the frame, however these extra render targets would consume quite a bit of VRAM (if the game is rendering at 4K with a single 32bit render target and 32bit depth buffer you are looking at 64MB per object), which could be a problem, and still might have issues overlapping with transparent objects from the game. It's also worth noting that AFAIK true order independent transparency is very rarely used, and when it is used it's usually limited to specific situations that actually require it, because it's performance sucks.

I have not yet found a way to pass the output from one PS into a second one though - from all my testing, running multiple PS has them all execute simultaneously and not in sequence (e.g. no way to make calculations of some vertices dependent on the results from other ones)

I think I might need you to clarify the issue you are having here - pixel shaders run in sequence, not simultaneously (at least not from our point of view - the driver may well be able to run some shaders in parallel as an optimisation in some cases, but not in a way that would change the result), though the individual vertices + pixels within a single draw call are processed simultaneously across GPU cores/threads.

The issue you might be running into is a rule that resources cannot be bound to both an output and input simultaneously. For example, e.g. if you bind a custom resource to a render target (e.g. o0) of one pixel shader, you should unbind it from that output slot before binding it to an input (e.g. ps-t100) of the next pixel shader. In some cases DirectX may automatically unbind a shader from a slot when such an input/output conflict occurs, but you should not rely on this as I've observed some inconsistencies in this behaviour.

Resources also must have appropriate bind flags set to allow them to be bound to different slots on the pipeline. 3DMigoto should set these automatically for custom [Resource]s based on how it sees the resource being used in the ini file, but this will not be the case for resources created by the game. e.g. if the game creates a render target that is not intended to be bound to a shader input, you will not be able to bind it to an input slot and must either substitute it with a resource 3DMigoto has created, or pay the performance cost of copying it into a new resource than can be bound to an input at the time you bind it (and this performance cost can add up quickly so you should avoid doing this more than once or twice per frame).

b) Output the opaque and transparent data separately, and then do the blending of the transparent part later in the frame once more things have been drawn

Yes, this is on the right path for the goal of deferring injected transparent geometry to the transparent pass.

(or even 1 frame later with the result of the previous frame, though that would create a 1 frame lag for transparency).

I'd try to avoid that if possible. 3DMigoto doesn't have a way to explicitly identify when a game starts/stops drawing a given render pass (hmmm... maybe I can add some detection based on e.g. blend state...), though you can usually identify a shader that signifies this - I often use the below script to analyse the frame analysis log file and identify shaders that might signify that the post processing render pass has started to use as frame analysis triggers - you would almost definitely want the transparent objects drawn before post processing, but it might turn out that the start of post processing could be a reasonable time to inject transparent objects (you may need to have already obtained a reference to the transparent render targets since the game may have unbound them before starting post processing).

The script doesn't specifically detect that post processing has started (games don't announce this to DirectX), rather I look for the point where the number of render targets drops down from however many GBuffers the game is using (so, 4+) down to 1 or 2.

https://github.com/DarkStarSword/3d-fixes/blob/master/find_frame_analysis_candidates.py

Edit: Forgot to mention, obviously just doing a frame analysis with dump_rt and looking through the results is a good way to identify these as well. I use the above script as it's usually a lot faster than dumping the full frame and usually finds a suitable shader to use. Occasionally something doesn't make sense looking at the script alone if the game is doing something unusual (e.g. extra render passes or unusual render pass orders), that looking at a dump_rt will usually clear up.

This seems to be the most hopeful, but I have been running into issues with adding more render targets (games either ignore the added ones,

or the added ones don't clear properly between frames and accumulate junk data).

You can clear these yourself at the present call:

[Present]
clear = ResourceFoo

It also isn't a very general method since you need to identify when in the frame to actually do the blending which would vary from game to game (since many games flip the screen at some point when drawing).

This is always going to be the case to at least some extent, because games don't announce when they change render passes to DirectX so we can only ever look for clues as to when this happens, and that is always going to be at least somewhat game/engine specific. I'm not opposed to adding heuristics into 3DMigoto to detect this for certain engines that are known to exhibit certain behaviour we could look for, but my philosophy is generally to try to provide the low level tools that can be adapted to work with any game/engine (ideally if we were to add a heuristic to detect this in e.g. Unity it would be a shortcut to configure the low level tools in a certain way, though if it comes to it it should be possible to implement a feature that knows how to call into Unity and interrogate it to find out exactly which render pass it is on - Unity should be doable since that is managed C# code with reflection, but forget about doing something like that for Unreal or most other custom C++ engines).

At the moment detecting this is done by matching shaders and maybe render target hashes - but I think there are more possibilities we could add here, such as matching on the number of bound render targets, or some aspect of the blend/render state, etc. There might be some performance costs associated with adding more ways to test, but I think it might be worth it. If you have any specific tests in mind you think might be helpful, let me know.

This method also has issues since you need to find a way to distinguish between objects that are in front or behind the character yourself since you are doing the blend manually and can't rely on the depth data unless you store it somehow.

Right. Most games will only have depth buffer writes enabled during the opaque pass, and during the transparent pass will have the depth buffer set to test only, but won't be writing to it.

c) Take all the data (vb/ib/cb/shader/etc) from the draw call and move the entire call later in the frame by creating a new custom shader/call. I haven't found a way to "insert" new calls, and even if we override later calls parts with all the relevant data parts still seem to be missing (e.g. I can't find any way to turn on glow if the original shader had it and the new one does not).

You absolutely can insert new draw calls - this is really what the [CustomShader] sections were originally for (the idea of using them to replace an existing draw call was just the next step that naturally followed from that). But you are right, the problem here is not so much injecting a draw call, as transferring any required state from the original draw call (which is doable for one or two of these since we can grab references to all bound resources, but we don't have a good way to store these in a list of arbitrary length if we don't know how many of these we need in advance), especially if some of the output buffers have already been consumed and are no longer useful to write to.

This also has the issue that often later frames depend on the output from earlier ones, so by moving the entire frame we potentially deprive later shaders of the information they need to function.

Try to avoid creating new inter-frame data dependencies if possible. Even if it renders correctly for you these dependencies murder performance in multi-GPU SLI configurations as it creates stalls as the GPUs have to wait for data to be copied from the other GPU. A lot of games end up accidentally creating these dependencies without realising, because any render target that isn't cleared between frames will create such a dependency (and half the NVIDIA profiles are hacks to tell the driver to ignore these false data dependencies to speed up SLI).

SilentNightSound commented 10 months ago

Thank you so much for the detailed response! This is something I've (and a chunk of our modding community XD) been working on for a while without making much progress so it is super helpful to get so much information on how it works and some more avenues we can use to tackle the problem.

You probably don't want to use any of the SRC1 blend options - those are pretty restrictive since they place special meaning on o0 and o1 and aren't generally useful outside of some pretty specific techniques. To specify the blend state for all 8 render targets independently you would use syntax like the following:

blend[0] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[1] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[2] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[3] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[4] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[5] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[6] = ADD SRC_ALPHA INV_SRC_ALPHA
blend[7] = ADD SRC_ALPHA INV_SRC_ALPHA

If you just use blend = ... without specifying the render target it will pass IndependentBlendEnable=false to DirectX, i.e. all render targets will use the same blend mode.

Ah, good to know - after reading your comments and the documentation, I realized I had a misunderstanding about how SRC_ALPHA worked. I thought it specifically read the alpha value from o0 and SRC_ALPHA1 specifically read from o1, but now I see that it actually reads the value from the render target it is operating on (though it still has an issue if that target is using the alpha channel for something besides transparency, which is sadly the case in genshin).

Also note that if you are trying to modify the blend state for an existing draw call, specifying any blend option will set every other blend option that you didn't specify back to whatever is the default in DirectX. You can alternatively specify blend_state_merge=true to tell 3DMigoto to merge your options with whatever the game is using instead of the defaults, though this comes with a slight additional performance cost at runtime since 3DMigoto can't know the options the game will use in advance and has to recreate the merged blend state every time the CustomShader section is run.

This is also super useful, thank you - I did not know about the blend_state_merge=true and that explains why sometimes when I changed the OM state it seemed to mess up unrelated parts of the call.

So, I was wondering if either a) there is a way to set the blend_factor dynamically based on something like a texture or equation

It certainly can't be a texture - blend_factor is passed to OMSetBlendState API and is a single value that will be used across the entire draw call and every render target that uses it in the blend mode.

We could potentially look into allowing it to be evaluated as part of the command list if you think that might be useful - it would still be a single value for the entire draw call, but this would allow it to be changed between draw calls. I think anything that could be done with this could also be done by passing the value into the shader and blending there, and from what you describe I'm not sure if it would help much in GI - but if think you may have a use case where it would be more convenient to do this in the ini file we could certainly look into supporting this.

Hmm, for setting the specific value I don't think there would be much benefit to being able to evaluate in the ini if it is still forced to be a constant value for the entire call - it is already possible to mimic that behaviour by choosing between different custom shaders with different transparencies based on a variable, or by passing in the factor to the shader. It was more that if the value could be set depending on location it could be used for gradient transparency, but that seems to be impossible from what you have said.

or b) if it is possible to using things like SRC_ALPHA even when all the channels are full (maybe by moving the channel?

On the one hand the restriction that the w channel is alpha for the purposes of OM blending kind of comes from DirectX, and the meaning of the channels outside of that comes from the game, but 3DMigoto does give you a lot of power here and if you really wanted to move the channel, do the blending, then run a custom shader to fix the channels back up to what the game is expecting before passing it on to the next draw call you probably could.

Interesting, I will look into this. My previous attempts to move the channels (either by changing their type or setting them somewhere else) around seemed to result in the game ignoring the changes and still using the original definitions, but it is possible I made a mistake when testing.

For 2), the issue lies in the order parts are drawn. First, I am running under the assumption that 3dmigoto is unable to change the order objects are drawn in the frame, or delay certain calls until others have completed (I haven't seen anything in the code or all my searching in the documentation/forums, though I may have missed it). This means we are "stuck" with the order the game decides to draw objects in, which becomes an issue when trying to implement transparency.

Yes, this is correct, and something I would like to find a good way to solve, but it's a difficult problem and I don't have a particularly clear idea of how a solution would work in practice that doesn't have its own set of problems - e.g. say we added some feature to record draw calls of interest from an opaque rendering pass of the game and replay them during a later transparency pass (which would definitely help with some of the issues you have encountered). In the opaque pass objects are often drawn in a pretty arbitrary order and often does not even follow a consistent order between subsequent frames, but in the transparency pass (depending on the technique the game is using) objects may be required to be sorted back to front (or front to back depending on the technique) by their distance to the camera, but 3DMigoto is fundamentally unaware of the distances so cannot meaningfully sort them so would not be able to replay the draw calls in a suitable order - and if the transparent effects we are injecting overlaps with other transparent effects in the game we would really need to interleave the injected draw calls with the other draw calls from the game in order to render it all correctly... which makes a truly correct solution here even harder.

If you have any ideas on how we might be able to solve this I'm open to suggestions. We may end up having to settle for something that is less than perfect, or solutions that are somewhat specific to games using certain transparency techniques (these can't even be engine specific, because e.g. Unity and Unreal both have a number of different render pipelines and options that developers can choose between).

My current best solution to handling the depth issue was to split the output into 3 different parts - transparent, opaque, and both combined. Then, continue drawing the rest of the scene on top of the combined output target. By comparing the final output with the original transparent and opaque, this will give us information about what objects are in front and behind the transparent portion, though it won't solve overlapping transparency. Even just getting "basic" transparency working without it being impacted by draw order would be a big improvement, even if it is not feasible to get the overlap functional

How much overlapping transparent objects are there in GI (modded and vanilla)? How much would solving just this part of the problem help?

Not much - very few objects have transparency at all. People have modded in transparent objects/clothing, but it is still not common due to the complexity of getting it working (and usually pretty limited in scope)

I have not yet found a way to pass the output from one PS into a second one though - from all my testing, running multiple PS has them all execute simultaneously and not in sequence (e.g. no way to make calculations of some vertices dependent on the results from other ones)

I think I might need you to clarify the issue you are having here - pixel shaders run in sequence, not simultaneously (at least not from our point of view - the driver may well be able to run some shaders in parallel as an optimisation in some cases, but not in a way that would change the result), though the individual vertices + pixels within a single draw call are processed simultaneously across GPU cores/threads.

I meant PS in a single call yes - my idea was to have the opaque portion drawn by one shader, then pass the output of that shader to a second shader in the same call that would draw the transparent portion on top. That way, we only have to deal with a single call and don't need to store and pass the information to a later shader (and also don't need to worry about draw order, at least for that specific part).

This seems to be the most hopeful, but I have been running into issues with adding more render targets (games either ignore the added ones,

or the added ones don't clear properly between frames and accumulate junk data).

You can clear these yourself at the present call:

[Present]
clear = ResourceFoo

This makes sense - I wasn't aware of this command previously. I had been setting the resources to null, but looking back it may not have actually been clearing them properly which is why I was observing the junk data.

It also isn't a very general method since you need to identify when in the frame to actually do the blending which would vary from game to game (since many games flip the screen at some point when drawing).

This is always going to be the case to at least some extent, because games don't announce when they change render passes to DirectX so we can only ever look for clues as to when this happens, and that is always going to be at least somewhat game/engine specific. I'm not opposed to adding heuristics into 3DMigoto to detect this for certain engines that are known to exhibit certain behaviour we could look for, but my philosophy is generally to try to provide the low level tools that can be adapted to work with any game/engine (ideally if we were to add a heuristic to detect this in e.g. Unity it would be a shortcut to configure the low level tools in a certain way, though if it comes to it it should be possible to implement a feature that knows how to call into Unity and interrogate it to find out exactly which render pass it is on - Unity should be doable since that is managed C# code with reflection, but forget about doing something like that for Unreal or most other custom C++ engines).

At the moment detecting this is done by matching shaders and maybe render target hashes - but I think there are more possibilities we could add here, such as matching on the number of bound render targets, or some aspect of the blend/render state, etc. There might be some performance costs associated with adding more ways to test, but I think it might be worth it. If you have any specific tests in mind you think might be helpful, let me know.

This makes sense - even if it has to be on a per-game basis, as long as there is a method that can apply the work would only need to be done once to find where to do the blending. I can't think of a way to generalize it except maybe by looking for number of render targets, but I'm not sure if that would work across different games.

c) Take all the data (vb/ib/cb/shader/etc) from the draw call and move the entire call later in the frame by creating a new custom shader/call. I haven't found a way to "insert" new calls, and even if we override later calls parts with all the relevant data parts still seem to be missing (e.g. I can't find any way to turn on glow if the original shader had it and the new one does not).

You absolutely can insert new draw calls - this is really what the [CustomShader] sections were originally for (the idea of using them to replace an existing draw call was just the next step that naturally followed from that). But you are right, the problem here is not so much injecting a draw call, as transferring any required state from the original draw call (which is doable for one or two of these since we can grab references to all bound resources, but we don't have a good way to store these in a list of arbitrary length if we don't know how many of these we need in advance), especially if some of the output buffers have already been consumed and are no longer useful to write to.

I will have to look into this more. I tried a few times to move data from earlier calls to later ones, but it was never really successful - I wasn't able to move things like glow effects, and could not find a way to identify and store all the data reliably. Some members of the community were experimenting implementing arrays into the ini, but I believe that method still required you to know what the values were ahead of time.


I will continue experimenting with different methods as well as the things you mentioned here. We have actually had some good success with porting parts of the transparent models onto the outlines since those are drawn so late, but that still comes with drawbacks - I am still looking for a general way to do this that could apply to other games as well.

DarkStarSword commented 10 months ago

Ah, good to know - after reading your comments and the documentation, I realized I had a misunderstanding about how SRC_ALPHA worked. I thought it specifically read the alpha value from o0 and SRC_ALPHA1 specifically read from o1, but now I see that it actually reads the value from the render target it is operating on

No worries, the documentation isn't all that clear on how this works and I remember coming to the same misconception when I first went through it.

This is also super useful, thank you - I did not know about the blend_state_merge=true and that explains why sometimes when I changed the OM state it seemed to mess up unrelated parts of the call.

Also worth knowing about two other options that do the same thing for other parts of the render state:

depth_stencil_state_merge=true
rasterizer_state_merge=true

Hmm, for setting the specific value I don't think there would be much benefit to being able to evaluate in the ini if it is still forced to be a constant value for the entire call - it is already possible to mimic that behaviour by choosing between different custom shaders with different transparencies based on a variable, or by passing in the factor to the shader. It was more that if the value could be set depending on location it could be used for gradient transparency, but that seems to be impossible from what you have said.

Righto. I may still end up implementing this now that my mind has considered the possibilities (e.g. I could imagine it being used in something like the help text shader to add a simple fade in/out), but as a low priority task since the same result can be achieved through other means.

Interesting, I will look into this. My previous attempts to move the channels (either by changing their type or setting them somewhere else) around seemed to result in the game ignoring the changes and still using the original definitions, but it is possible I made a mistake when testing.

Definitely worth experimenting with, but I'm not sure that this would be the best option for a final solution - it will probably be fine if it only happens a couple of times in a single frame, but I'd have some concerns about potential performance impacts if this happens much more than that.

Just for an idea of what sort of things to look out for performance wise - if a render target has been written to by one shader (or another write operation such as a copy or clear), and then is bound to the input of another shader, the GPU will have to stall until the writes have completed before it can start the next shader. Properly optimised games would typically try to minimise these stalls by avoiding interleaving read and write operations to a single resource any more than necessary (e.g. a resource would only be written to throughout an entire render pass, then some unrelated render pass could optionally be added to do some useful work while the writes are still in flight, and only then go through a render pass that reads from the buffer)... though many games are not that well optimised in practice (e.g. Unreal's DX12 implementation pretty much guarantees a stall when changing render passes as it inserts explicit barriers that wait for writes to all render targets of the previous pass to complete, even if the following pass doesn't actually use them as inputs / at all. Notably Unreal's DX11 render won't suffer from this same guaranteed stall because in DX11 DirectX is responsible for inserting the barriers rather than the game engine, and Microsoft is smart enough to avoid inserting unnecessary barriers).

These type of stalls won't show up in 3DMigoto's built in profiler (which only profiles 3DMigoto's added CPU overhead), and can only really be seen with GPU profiling tools such as Pix for Windows (and consoles all have their own similar GPU profiling tools they make available to authorised developers).

My current best solution to handling the depth issue was to split the output into 3 different parts - transparent, opaque, and both combined. Then, continue drawing the rest of the scene on top of the combined output target. By comparing the final output with the original transparent and opaque, this will give us information about what objects are in front and behind the transparent portion, though it won't solve overlapping transparency. Even just getting "basic" transparency working without it being impacted by draw order would be a big improvement, even if it is not feasible to get the overlap functional

An alternative method I think might be worth trying, is rendering the transparent objects out to separate render + depth targets, with the depth test disabled, but depth writes enabled. I'll need to double check how to configure this properly when I get home, but off the top of my head it should be something like:

depth_enable = true
depth_write_mask = all
depth_func = always

Then later in the frame when you go to blend these back with the opaque buffer you should then be able to compare that depth buffer with the one from the game, either just doing the comparison in the shader by binding both depth buffers as inputs and comparing them yourself (perhaps using the discard instruction in the pixel shader if you determine that the transparency is not needed on a given pixel). Or you could configure DirectX to do the depth test for you by binding the game's depth buffer to oD (if it isn't already), setting up a typical transparency depth test with writes disabled (e.g. depth_enable=true, depth_write_mask=zero, depth_func=less) and using the SV_Depth semantic to output the transparency depth from the pixel shader forcing a late Z test (though I have had some trouble getting SV_Depth to actually work in the past and was never sure if I was using it correctly).

I meant PS in a single call yes - my idea was to have the opaque portion drawn by one shader, then pass the output of that shader to a second shader in the same call that would draw the transparent portion on top. That way, we only have to deal with a single call and don't need to store and pass the information to a later shader (and also don't need to worry about draw order, at least for that specific part).

I'm still not entirely clear on what you mean here - you can't have two different pixel shaders bound in the same draw call...? Unless you mean a single draw call from the game but multiple draw calls that we are injecting with 3DMigoto?

If you do mean a single draw call you can do something like this with Unordered Access Views (UAV), which allows a resource to be used for both input and output simultaneously. But these are typically only used with compute shaders (bind them with cs-u0 = ResourceFoo) and the shader may become responsible for adding any required synchronisation between IO (e.g. via InterlockedAdd() and similar functions if accessing the UAV as a RWByteAddressBuffer (these are not available to RWTexture2D UAVs), or with explicit memory barriers like AllMemoryBarrierWithGroupSync() and related functions.

UAVs can be used in pixel shaders (and potentially other shader types if the 11.1 feature level is available, but you shouldn't rely on that), but there are a bunch more restrictions and caveats compared to using them in compute shaders - the syntax to bind them is ps-u1 = ResourceFoo, but they share the same slots as render targets and there's some issues with the DirectX API wanting information that 3DMigoto doesn't know... I don't remember offhand what the gotchas are here as a result, but looking through places I've used these it looks like I bound all render targets first (o0=..., o1=..., o2=...), then bound a UAV in the following slot (ps-u3=...) and I doubt binding a second UAV to a pixel shader will work in the current version. Also, some of the synchronisation functions aren't available in pixel shaders, though this might not be such an issue if a given invocation is only accessing the data in a single pixel.

SinsOfSeven commented 10 months ago

Hello! After going over this discussion I've had some success in tackling this issue, though it's come with many of it's own challenges. I want to walk through what I've come up with, what I think could be improved but don't know where to start, as well as some things that I've been having trouble with that I don't really know how to debug.

I'll break down my approach.


  1. Regex Match Diffuse Using RegexShaders I match all the Diffuse Textures for characters (exploiting a censorship feature to not match the environment without having too be too specific) This censorship feature is something people often mod out, so to avoid conflicts I went ahead and built in a toggle for it as well.

  2. Discard by Texture mask After matching, I add a simple filter to the bottom of the regex, that if there is a texture in my extra slot, (rather if there is color data) I discard. This is similar to what silent suggested, but I abandoned the idea of trying to pass this data to another PS in order to blend it.

  3. Regex Match Outlines The game we are modding does a second pass later in the frame to draw the characters outlines, at this stage more of the enviornment has been drawn and the vertex data matches the entities, I've decided to seize this pass in order to do a second color pass, rather than drawing the outlines. Though it's also on the condition of the texture slot having color data. If the slot has no color data, it will still draw the original shader's color data, to preserve the original use of the shader.

  4. "Replace" by Texture mask (Incomplete step) continuing from the last statement, we repeat the previous condition but instead of discarding, we supply our own color data. We use the diffuse and lightmaps for this step to emulate the originals to a close degree, but we do need to change the interpretation of some of the data. Not every single character uses exactly the same shader conditions so to get a reasonable result we ignore pretty much everything that might not be applicable globally.

Remarks: I want to make kind of a framework, where a user can supply a texture and some params to achieve things like animated effects, taking advantage of the outline's properties with illumination/shadow/emissives/bloom, and with out custom blended transparency. An example might be, they could feed in a texture where only the rightside quad/half is applied as the texture walk so they can target parts of their model with clever UV mapping, so they can stay within the confines of the game's expectations while still getting some great results.


I haven't been able to fully succeed with the above method, it works, but I'm clearly struggling with editing the shader. As a side note, I'm very appreciative of the shaderregex implementation. I had to read the Src to try and suss out the behavior of having multiple matches for the same shaders. And in doing so discovered a cute easteregg when the regex returns no_memory.

Because I've effectively differed the draw step to a later part of the frame, but in a hacky way, I was wondering what it would take to do this even further along, after all but the UI has been blended so that we could achieve perfect environmental shading. It already gets global illumination and shadows, but things like bloom from a candelabra will not be part of the blend (though I might just need to copy their render target as well, I just thought about that.) I also have a no idea how to fix the game discarding geometry from behind what it believes to be opaque. I don't really want to match all the env shaders to replace that function so the modders might just need to find a way to deal with it. Back to my point, using this outline shader means we are using the same vertex data and the outlines also blend with the rim lighting and bloom, even though those are giving me grief now, I think they will greatly enhance the final product.

Onto something else that's giving me grief, resource management. Despite my best attempts, I cannot seem to properly clear by custom texture so it does not contaminate other objects. I succeeded in mitigating the effect quite a lot by using active conditions. But I later discovered that even with those conditions I wasn't able to clear them as intended. I'm using reference resources and clearing them by doing something like this.

Anything commented out is something I've tried to use to help at all, and anything not commented is also tentative. Because these shaders have passes potentially hundreds of times per frame, I wanted to only bind and release the resources immediately before and after their use, but I cannot seem to properly do either.

[ShaderRegexOutlineTransparency]
;Runs the command list from the next section
pre run =  CommandListResetCustomResources
if $active == 1
    shader_model = ps_5_0
    temps = ini
    if ps-t69 == null
        pre ResourceRefo1 = o1
        ;Runs the command list from the next section
        run = CommandListSetResources
        pre x = rt_width
        pre y = rt_height
    endif
    if $use_default_shader == 1
    ;Set some OM blend state stuff (or invoke a diff PS)
        run = CustomShaderTransparencyPlus
    endif
    $use_default_shader = 0
    $active = 0
endif
[CommandListSetResources]
if $switch == 0
elif $switch == 1
    ResourceRefDiffuse = reference ps-t0
    ResourceRefLightmap = reference ps-t1
elif $switch == 2
    ResourceRefDiffuse = reference ps-t1
    ResourceRefLightmap = reference ps-t2
endif
if $switch > 0 && $active == 1
    run = CommandlistsSetCustomResources
endif
$switch = 0
ps-t0 = ResourceRefDiffuse
ps-t1 = ResourceRefLightmap
;checktextureoverride = ps-t0
;checktextureoverride = ps-t1
;checktextureoverride = ps-t2

[CommandlistsSetCustomResources]
ResourceRefMask = reference ps-t69
ps-t26 = ResourceRefo1
ps-t69 = ResourceRefMask
;checktextureoverride = ps-t26
;checktextureoverride = ps-t69

[CommandListResetResources]
;clear = ResourceRefDiffuse
;clear = ResourceRefLightmap
ResourceRefDiffuse = null
ResourceRefLightmap = null
run = CommandListResetCustomResources

[CommandListResetCustomResources]
;clear = ResourceRefMask
;clear = ResourceRefo1
ResourceRefMask = null
ResourceRefo1 = null
ps-t69 = null
ps-t26 = null
[CustomShaderTransparencyPlus]
drawindexed = auto
;Outlines are inverted hulls, so cull restores the front faces.
;I would prefer to do this in the shader, but I'm actually not sure where to start.
;It's probably something to do with normals, but that's a problem for future me.
cull = none
run =  CommandListResetResources
;ps = OutlineWithDiffuseColor.hlsl

If there are any obvious logical errors I apologize, I've been working on this for days swigging between states of "This is working, it's nearly done" and "This broken mess will never work" I'm having trouble keeping track of what seems like it's helping and what changes are causing pain. Even with version control.

Anyways, this isn't really a plea for help or anything. I'm pretty committed to making this a reality, but I wanted to share what I've accomplished, and some insight on how. I've included a gutted version of my work (removed Regex matches). I was so happy when I realized I could use namespaces like this! TexFx_Gutted.zip

One of the best ways to break through your own issues it to take the time to explain it. And because this discussion was a major contribution to my own progress on this issue, I thought I would contribute to the discussion as well. This is something that many people in our community have tried to tackle, and I've drawn on so many past experiments we've done. Everytime gaining a better understanding of 3dmigoto and dx11. I also got a lot of help from This document that outlines more of the technical specs of Dx11 that is left out on the Learn Microsoft pages.

SinsOfSeven commented 10 months ago

Quick update, apparently I over engineered the resources. Apparently there wasn't a reason to try to manually manage them the way I did, and just resetting my custom texture to null in the shaderregex section is fine, since the resource is already bound for that call.