Various Assembler Bugs - Githubissues

Trying to keep track of some of the various bugs in the assembler

~~This line causes the assembler to enter an infinite loop and fill up memory:~~
```
 // this (e.g. small monitors where the large tiles are smaller than the max
```
Possibly processing brackets before comments? Note that this only happens if the line is indented - possibly ignoring comments that aren't at the start of a line (and we are just fluking that it then ignores the bad "//" instruction?)

We now handle comments starting at any point in the line.
~~Assembler will crash if an instruction has fewer than expected arguments. Need to double check, but I think it was something like this:~~
```
add r1.x, r2.x
```
The bug still exists, but we now catch the bad allocation exception.
~~Assembler does not currently alter the ISGN and OSGN sections, which will cause issues for adding additional inputs/outputs to shaders~~ ~~Code is now implemented in cmd_Decompiler. Once tested we can use it in game.~~ Code is now fully integrated into 3DMigoto.
~~Assembler ignores lines indented with a tab instead of spaces~~
~~Assembler does not fail on bad instructions, instead it silently drops them~~
~~Assembler does not fail if an instruction had more arguments than expected, instead it ignores the extra arguments~~
~~Assembler does not handle SV_GSInstanceID semantic~~
~~Assembler does not handle enableMinimumPrecision global flag. Edit: Implemented, but see 9.~~
~~Assembler does not handle loading minimum precision types from resources.~~
~~Assembler does not handle double precision literals.~~
Assembler can produce a corrupt binary (cannot be disassembled) if the 'r' is missing from a temporary register. Edit: Can't seem to reproduce the crash with the latest batch of error handling, however I didn't fix this one explicitly so not crossing it off yet until I've verified it.
~~Assembler produces bad shader and crashes game on incorrect capitilisation for special purpose registers, e.g. vThreadIdInGroupFlattened instead of vThreadIDInGroupFlattened~~
~~We do not generate an SFI0 section ("SubtargetFeatureInfo") in shaders that fxc does (unknown severity)~~
~~Signature parser does not always use the same version signature sections as fxc (unknown severity)~~
TODO: Implement support for functions, labels & interfaces (Low priority as no known real world use, but we do now have test cases)

Not strictly bugs:

Would be nice to add some validation that dcl_temps is exactly 1 larger than the largest temporary register used in the shader
Would be nice to validate that instructions and declarations are not intermixed
Would be nice to validate that anything referenced in the shader that requires a declaration has been declared
Would be nice to validate the correct use of swizzles (one or four characters, not two or three).
Assembler produces multiple warnings from the compiler when built. If these are bugs they should be fixed, if they are not they should be silenced (ideally without using a pragma unless very certain they are harmless). IIRC 64bit builds produce more warnings than 32bit.
Assembler produces many warnings during static code analysis. These each should be investigated to determine if they signify a bug or a false positive. This should be done on both 32bit and 64bit as the code analysis can miss some bugs in printf like functions when the size depends on the architecture.

Flugan - if you haven't already seen it I made some minor tweaks to the assembler a while back that you might like to merge. See commit https://github.com/bo3b/3Dmigoto/commit/45e0d9fbaae1dbca829b89d81568ef9980ec99a0

To go out of context I don't know if Flugan wrapper is needed anywhere on Win7 or Win10, I've left Win8.x behind personally. I currently have four versions, two different ones for dx11 one for dx10 and finally got a dx9 wrapper running. All of these hooks DirectX and does not wrap making it hard to do very sofisticated stuff.

Back on topic Comments are only supported at the beginning of a line as that is what fxc produces. fxc only produces correct code which is handled correctly. The assembler does not alter ISGN and OSGN and doing so would require comparing the original code with the new code to find changes which is fairly complex As far as I can tell not changing ISGN and OSGN is not even a bug as we have been able to insert signals already with the current assembler. I'm not the expert on this. The assembler has never encountered a tab. The assembler has basically no error checking Basically don't write bad code.

My perspective is tilted having run more than 100 000 decompiled shaders through the assembler and fixing problems along the way ending up with lots of shaders binary perfect without a hitch. From that perspective it is rock solid but not really intended for humans.

The assembler is sensitive and I'm reluctant to change it. I don't want a lot of tabs turning up complicating whitespace handling all over the place.

Before continuing I will touch on warnings. I'm not sure how to supress them. Havn't looked at them in ages but it mostly boils down to datatypes. All 32-bit integers are segmented into small integers with a given number of bits for things like opcode etc. When interacting with these tiny integers using standard integers you get a huge amount of overflow warnings. might be some insignificant signed unsigned warnings as well. The code works perfectly and the warnings are about things that can't occur.

Finally for the most important section. We really need a preprocessor that is aware that the code is made by humans who can fail, produces less strict code (using whitespace just like a normal parser), Using the right amount of operands for an instruction. Using only valid instructions. looks at declarations etc. The code is normally assembled during the game so where should the error go.

If needed I don't know on the top of my head how complex ISGN and OSGN handling would be if the only change is added signals and no removals. I remember being a bit perplex when looking at the binary code for those sections two years ago.

I've read that some games strip the headers making me believe that OSGN and ISGN is not really what matters in the end.

Over to my wrapper, when I finally managed to run Crysis 3 you decided to change hashing function while also integrating HW based crc at which point I stopped. Playing catchup was no longer fun. When automatic crosshair was presented I considered it beyond what I could manage using hooks.

The assembler works as intended so I can't consider any of these to be a bug. It is limited to the formating style of fxc where there are no indented comments or tabs. Most compilers have multiple stages. A c++ compiler will not start producing binary unless all the source code is valid. Most of the suggestions fits into the validation category so my suggestion is to write such a stage which returns either an error or code that can successfully assemble.

ok, thanks for your feedback (to be clear, I'm not asking you to fix anything, I'm just trying to keep track of the issues we have).

=== Whitespace & comments === Agreed, we can add a pre-processor to strip comments and replace tabs with spaces, so no need to alter the assembler for these. A pre-processor may also be a nice bonus to add things like #define so we can use names in place of registers (especially useful for copy & pasted patterns where we need to adjust the register numbers).

=== Number of operands for an instruction === This cannot really be done in a pre-processor unless we add an assembly parser to it, at which point we might as well go the full mile and replace the assembler. We might be able to use a try / catch block to prevent crashes in the assembler taking down the whole game, but only if they are not stack corruption (don't know, haven't checked). Ultimately I'd prefer this be fixed in the assembler, but if we can at least stop the crashes that would be enough.

=== Invalid instruction === Again, not something we can check for in a pre-processor without adding a full assembly parser to it. We probably do need to do something about this as it can easily lead to a runtime GPU crash due to using uninitialised data in further calculations, for loops, etc.

=== Infinite loop filling memory on unclosed bracket === This occurred in a comment, but I suspect this is probably a wider issue in the assembler and should be fixed. We could probably check for this in a pre-processor, though it is not the ideal place to do so.

=== ISGN & OSGN === I believe these stand for Input Signature / Output Signature (not signal) based on naming used elsewhere in DX.

These sections are always present as they are necessary to map input and output registers to their semantics (e.g. TEXCOORD) and semantic index - this is used by DirectX and the hardware to map outputs from one stage in the pipeline with inputs in the next stage, so without being able to modify this we can not add extra outputs as they will not be passed to anything (or if they are, we will not be able to control what).

When we refer to a shader having it's headers stripped we are referring to the sections containing debug and reflection information that we use to determine what name the game used for resources passed to the shader (this is exactly the same as running the strip command on an ELF binary and I assume there would be an equivalent for PE executables as well). The ISGN/OSGN are mandatory sections and will never be stripped.

I did have a look at these myself some time ago and deduced the meaning of the data in these sections, but the unfortunate fact is that the assembly language does not have enough information to reproduce them (the pitfalls of MS designing an assembly language that was never intended to be assembled I guess), but it looked like the comment in the shader does. I hate to make a comment part of the language, but our only choices are either to do so, or to change the language to include the necessary information (like DX9 shaders do).

Parsing the comment to produce the ISGN/OSGN sections is probably something that I can add externally to the assembler, but doing so does not really layer right at the moment since the assembler takes in all sections, not just the SHEX/SHDR sections, and the MD5-like checksum is currently calculated in the assembler which would need to be calculated with the new ISGN/OSGN sections as well.

With some refactoring I should be able to make this work, but I had been wary of making anything more than trivial changes to the assembler since you were working on it in your own wrapper and merging the changes was complicated as a result.

=== Warnings === Ok, I've fixed these since they were all fairly trivial - please do a code review of da9e9039a64020878b8616b416e441c8bd400ca6 when you have some time

=== Static code analysis failures === There weren't many issues identified, and it was mostly a matter of checking whether certain IO functions succeeded. It did identify one issue which is indicative of a larger issue, namely that there is no checks to make sure the shader binary looks ok - a bad / missing header could easily wreak havoc. I added one check for the specific issue it identified, but we could do better here. Please do a code review of 43fad33d5d34edc0541c6e5bd6a0eb687749cb22 when you have time.

I did a "quick" experiment to confirm. Choosing Batman Knights with 35000 shaders wasn't the best idea but it is done. I stripped away the ISGN and OSGN segments and only kept the SHDR and either I made a mistake or those are vital. Got a driver crash.

As Migoto evolved way beyond where I could follow I pretty much stopped and didn't really code for almost 6 months. As far as I know 3DMigoto has abandoned the Win7 only stance. I've left win8.x behind and could finally have a working Win7 to fall back on if Win10 fails. Metal gear solid 5 has no problem running on my code which might be troublesome for 3DMigoto unless you managed to solve that.

Back on topic Parsing input and output signatures needs to be added to the assembler. There are different options where the most ambitous is to create code that can parse the signatures of all shaders in all dx11 games we have access to.

Other options is to restrict additions to the bottom of the signature and reuse and restructure the current signature to add what I expect to be a fairly straightforward entry which is a fairly good way to handle signatures you don't actually understand without breaking.

github bgfx project contains code to read and write signatures from DXBC shader files. It also confirms that the hash is a modified md5 hash so my reverse engineering effort was not in wasted. I believe I got the initial datatypes for the assembler from somewhere in the wine code two years ago but I can't remember.

While testing the new signature parsing I noticed a small issue in one of my test cases.

Compile this shader with fxc.exe /T gs_5_0 test.gs /Fc test.asm:

struct PSSceneIn {
        float4 pos : SV_Position;
        float4 coord : TEXCOORD0;
};

[maxvertexcount(3)]
void main(
        triangle float4 ipos[3] : SV_Position,
        inout TriangleStream<PSSceneIn> OutputStream,
        uint id : SV_GSInstanceID
        )
{
        PSSceneIn o;
        o.pos = 0;
        o.coord = id;
        OutputStream.Append(o);
}

That produces:

//
// Generated by Microsoft (R) HLSL Shader Compiler 10.0.10011.16384
//
//
//
// Input signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position              0   xyzw        0      POS   float
//
//
// Output signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position              0   xyzw        0      POS   float   xyzw
// TEXCOORD                 0   xyzw        1     NONE   float   xyzw
//
gs_5_0
dcl_globalFlags refactoringAllowed
dcl_input_siv v[3][0].xyzw, position
dcl_input vGSInstanceID
dcl_temps 1
dcl_inputprimitive triangle
dcl_stream m0
dcl_outputtopology trianglestrip
dcl_output_siv o0.xyzw, position
dcl_output o1.xyzw
dcl_maxout 3
mov o0.xyzw, l(0,0,0,0)
utof r0.x, vGSInstanceID.x
mov o1.xyzw, r0.xxxx
emit_stream m0
ret
// Approximately 5 instruction slots used

Assemble then disassemble that using: cmd_Decompiler.exe -a test.asm cmd_Decompiler.exe -d test.shdr

That produces:

//
// Generated by Microsoft (R) D3D Shader Disassembler
//
//   using 3Dmigoto v1.2.29 on Wed Feb 17 03:15:22 2016
//
//
// Input signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position              0   xyzw        0      POS   float
//
//
// Output signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position              0   xyzw        0      POS   float   xyzw
// TEXCOORD                 0   xyzw        1     NONE   float   xyzw
//
gs_5_0
dcl_globalFlags refactoringAllowed
dcl_input_siv v[3][0].xyzw, position
dcl_input v0
dcl_temps 1
dcl_inputprimitive triangle
dcl_stream m0
dcl_outputtopology trianglestrip
dcl_output_siv o0.xyzw, position
dcl_output o1.xyzw
dcl_maxout 3
mov o0.xyzw, l(0,0,0,0)
utof r0.x, v0.x
mov o1.xyzw, r0.xxxx
emit_stream m0
ret
// Approximately 0 instruction slots used

~~Note that the vGSInstanceID special purpose register has changed into the v0 general input register. This appears to be an issue with the assembly text, not the new signature sections.~~

Update: I've added support for this SPR, and the related dcl_gsinstances

I also noticed another issue with minimum precision registers. This should be considered minor as these are not supported before Windows 8.0 and it is unlikely we will encounter them in the wild for some time.

To reproduce, compile this shader using fxc from the Windows 10 SDK:

fxc.exe /T vs_5_1 test.vs /Fc test.asm

void main(
        float4 ipos : SV_Position,
        out float4 opos : SV_Position,
        out float clip0 : SV_ClipDistance0,
        out float cull0 : SV_CullDistance0,
        out min16float v0 : TEXCOORD0,
        out min10float v1 : TEXCOORD1,
        out min16int v2 : TEXCOORD2,
        out min12int v3 : TEXCOORD3,
        out min16uint v4 : TEXCOORD4
        )
{
        opos = ipos;
        clip0 = 0;
        cull0 = 0;
}

That results in:

//
// Generated by Microsoft (R) HLSL Shader Compiler 10.0.10011.16384
//
//
// Note: shader requires additional functionality:
//       Minimum-precision data types
//
//
//
// Input signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position              0   xyzw        0     NONE   float   xyzw
//
//
// Output signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position              0   xyzw        0      POS   float   xyzw
// SV_ClipDistance          0   x           1  CLIPDST   float   x
// SV_CullDistance          0    y          1  CULLDST   float    y
// TEXCOORD                 0   x           2     NONE  min16f
// TEXCOORD                 1    y          2     NONE min2_8f
// TEXCOORD                 2   x           3     NONE  min16i
// TEXCOORD                 3    y          3     NONE  min16i
// TEXCOORD                 4     z         3     NONE  min16u
//
vs_5_1
dcl_globalFlags refactoringAllowed | enableMinimumPrecision
dcl_input v0.xyzw
dcl_output_siv o0.xyzw, position
dcl_output_siv o1.x, clip_distance
dcl_output_siv o1.y, cull_distance
mov o0.xyzw, v0.xyzw
mov o1.x, l(0)
mov o1.y, l(0)
ret
// Approximately 4 instruction slots used

Assemble and disassemble that using: cmd_Decompiler.exe -a test.asm cmd_Decompiler.exe -d test.shdr

That produces:

//
// Generated by Microsoft (R) D3D Shader Disassembler
//
//   using 3Dmigoto v1.2.29 on Wed Feb 17 03:33:25 2016
//
//
// Input signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position              0   xyzw        0     NONE   float   xyzw
//
//
// Output signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Position              0   xyzw        0      POS   float   xyzw
// SV_ClipDistance          0   x           1  CLIPDST   float   x
// SV_CullDistance          0    y          1  CULLDST   float    y
// TEXCOORD                 0   x           2     NONE  min16f
// TEXCOORD                 1    y          2     NONE min2_8f
// TEXCOORD                 2   x           3     NONE  min16i
// TEXCOORD                 3    y          3     NONE  min16i
// TEXCOORD                 4     z         3     NONE  min16u
//
vs_5_1
dcl_globalFlags refactoringAllowed
dcl_input v0.xyzw
dcl_output_siv o0.xyzw, position
dcl_output_siv o1.x, clip_distance
dcl_output_siv o1.y, cull_distance
mov o0.xyzw, v0.xyzw
mov o1.x, l(0)
mov o1.y, l(0)
ret
// Approximately 0 instruction slots used

~~Note that the enableMinimumPrecision global flag is missing.~~

Update: I've fixed that missing flag, but these still will need more work since loading them from a resource uses different syntax that the assembler does not recognise. I've added a min_precision.hlsl test case for this.

It is also worth noting that DX12 / shader model 5.1 / d3dcompiler47 introduces two new semantics SV_StencilRef and SV_InnerCoverage. I am handling these in the signature parsing code, but have not tested if the assembler is coping with them. SV_StencilRef uses the oStencilRef special purpose register, but I'm not sure what SV_InnerCoverage is using yet. Low priority for now.

I'm not yet certain if this is an assembler bug, or just a quirk of DX11 assembly, but this does not do what I expect:

mov r15.xyz, r6.yzw

I believe this is instead doing:

mov r15.xyz, r6.xyz

If I use a four component swizzle it works:

mov r15.xyz, r6.yzwx

Disassembling it does show .yzw, so this might just be a quirk of DX11 assembly - does three component swizzle ever get created by the compiler? 3 component masks are common, but I can only see 1 and 4 component swizzles in a couple of shaders I have handy, but then again I haven't checked a large sample size.

FWIW in DX9 assembly the final component in the swizzle was implicitly repeated if not specified, which would have made that:

mov r15.xyz, r6.yzww

Pretty sure I've never seen a 3 component swizzle as the starting ASM. As far as I know fxc always generates a 4 component swizzle, with repeated final components (for opcodes like mov). Could be part of the disassembler too though.

I know I've seen some examples of fxc reordering the swizzle in ways that I am nearly certain would introduce bugs, but if there is some sort of magic happening at the assembly level, that might be why it works.

Might be related to: https://github.com/bo3b/3Dmigoto/issues/10

3DMigoto 1.2.52 goes a long way to addressing these pain points: 1: Comments are now stripped 2: Underlying bugs not fixed, but it looks like Bo3b added a try / catch block around the assembler a while back to stop these crashing the whole game. 3: Still not hooked up in 3DMigoto (only cmd_Decompiler) 4: Tabs are now supported for indentation 5: This is still a major issue - the biggest remaining pain point by far 6: Still an issue 7: SV_GSInstanceID is now implemented, along with the related dcl_gsinstances declaration 8: This global flags is now supported, as are all the other missing flags 9: Discovered this problem and noted it here 10: Discovered this problem and noted it here.

I did a fairly comprehensive audit of the instructions the assembler implements, and implemented dozens and dozens of missing instructions, as well as missing options to various other instructions. There are still a few missing instructions (function calls, debug layer, hull shader), but they should be quite rare.

I have noticed a few shaders that fail validation, yet produce identical assembly to the original - I'm not positive if these represent a real issue or not yet.

3 is now fully implemented in 3DMigoto 1.2.65

Helifax discovered a previously unknown issue where blank lines prior to the shader model would result in a corrupt shader, which is now fixed by this commit: https://github.com/bo3b/3Dmigoto/commit/1113fdd955ea1cfa56a8124cab5d450093810ded

Bah! Last week I wrote up an entire discussion for another assembler bug that DHR and I would like you to take a look at, but it got lost somehow. Once more, with feeling.

I have noticed a few shaders that fail validation, yet produce identical assembly to the original - I'm not positive if these represent a real issue or not yet.

Likely related, DHR ran into a weird problem with swizzle not working correctly, probably related to the problem referenced above. We have a good example and test case now. The ASM shader uses a .yzw swizzle. We worked through a lot of the possibilities, and as near as I can tell this is definitely an assembler bug.

The reason this one is important is because it's part of DHR's universal UE4 regex fix. These shaders cannot presently be fixed with regex because it silently fails.

Our scenario is in two games, RadRogers, and ECHO.

Testing back to back, we took the shader referenced below, and tried to make sure we did not have any syntax or other errors. No errors are reported in log or when using cmd_decompiler. Shader seems to be dropped or fails on hardware, as the code changes are not active, the image stays broken.

We took the original shader, and drop it in unchanged, and it seems to work. Image is broken, but no errors are reported.
We added output zeroing, to be sure we could actually make changes to the ASM shader. No fix applied. This worked, output is zeroed, so effect disappears, and reappears with F9. ASM clearly assembled and working.
Adding fix to the original ASM code. Fails. No errors, but effect does not change, no change with F9.
Trying to minimize code changes, change fix to save and restore the active .yzw swizzled register over the fix. Still broken.
Switching to HLSL Decompile for same shader, we can add the same fix code in HLSL. Works. Effect is fully fixed, F9 shows broken original. ASM generated is wildly different, including no .yzw swizzle. But this proves the fix itself is correct.
Dropped that disassembly of the HLSL code in as an ASM shader. This also works to fix the effect, even though ASM is wildly different.
Using cmd_decompiler, I took the original ASM and added the ASM fix to the code. Then assembled it manually with Flugan assembler, then took that output, and re-disassembled it. The code in the disassembly is identical to the original, including the fix.

Test shader that fails in ASM, with fix added.

// 5a2c8678cc6226e0-ps_replace.txt
//
// Generated by Microsoft (R) D3D Shader Disassembler
//
//   using 3Dmigoto v1.2.67 on Sat Dec 23 22:42:21 2017
//
//
// Input signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// TEXCOORD                10   xyzw        0     NONE   float   xyz 
// TEXCOORD                11   xyzw        1     NONE   float   xyzw
// COLOR                    0   xyzw        2     NONE   float   x   
// TEXCOORD                 4   xyzw        3     NONE   float   xy  
// TEXCOORD                 9   xyz         4     NONE   float   xyz 
// SV_Position              0   xyzw        5      POS   float   xyzw
// SV_IsFrontFace           0   x           6    FFACE    uint       
//
//
// Output signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Target                0   xyzw        0   TARGET   float   xyzw
// SV_Target                1   xyzw        1   TARGET   float   xyzw
// SV_Target                2   xyzw        2   TARGET   float   xyzw
// SV_Target                3   xyzw        3   TARGET   float   xyzw
// SV_Target                4   xyzw        4   TARGET   float   xyzw
// SV_Target                5   xyzw        5   TARGET   float   xyzw
// SV_DepthLessEqual        0    N/A oDepthLE  DEPTHLE   float    YES
//
ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[57], immediateIndexed
dcl_constantbuffer cb1[55], immediateIndexed
dcl_constantbuffer cb2[12], immediateIndexed
dcl_constantbuffer cb3[22], immediateIndexed
dcl_constantbuffer cb4[3], immediateIndexed
dcl_constantbuffer cb5[18], immediateIndexed
dcl_sampler s0, mode_default
dcl_sampler s1, mode_default
dcl_sampler s2, mode_default
dcl_sampler s3, mode_default
dcl_sampler s4, mode_default
dcl_sampler s5, mode_default
dcl_sampler s6, mode_default
dcl_resource_texture2d (float,float,float,float) t0
dcl_resource_texture2d (float,float,float,float) t1
dcl_resource_texture2d (float,float,float,float) t2
dcl_resource_texture2d (float,float,float,float) t3
dcl_resource_texture2d (float,float,float,float) t4
dcl_resource_texture2d (float,float,float,float) t5
dcl_resource_texture2d (float,float,float,float) t6
dcl_resource_texture2d (float,float,float,float) t7
dcl_input_ps linear centroid v0.xyz
dcl_input_ps linear centroid v1.xyzw
dcl_input_ps linear v2.x
dcl_input_ps linear v3.xy
dcl_input_ps linear v4.xyz
dcl_input_ps_siv linear noperspective centroid v5.xyzw, position
dcl_output o0.xyzw
dcl_output o1.xyzw
dcl_output o2.xyzw
dcl_output o3.xyzw
dcl_output o4.xyzw
dcl_output o5.xyzw
dcl_output oDepthLE

dcl_temps 31
ld_indexable(texture2d)(float,float,float,float) r25.xy, l(0, 0, 0, 0), t125.xyzw

mul r0.xyz, v0.zxyz, v1.xyzx
mad r0.xyz, v1.zxyz, v0.xyzx, -r0.xyzx
mul r0.xyz, r0.xyzx, v1.wwww
mov r0.w, v5.z
mov r1.x, l(1.000000)
mul r0.w, r0.w, v5.w
mul r2.xyzw, v5.yyyy, cb0[37].xyzw
mad r2.xyzw, v5.xxxx, cb0[36].xyzw, r2.xyzw
mad r2.xyzw, v5.zzzz, cb0[38].xyzw, r2.xyzw
add r2.xyzw, r2.xyzw, cb0[39].xyzw
div r1.yzw, r2.xxyz, r2.wwww

// World coordinate to fix register
mov r30.xyzw, r1.yzww

//Fix
//Translate r30.xyz (world) to clip cb1[0,1,2,3]
mul r26.xyzw, r30.yyyy, cb1[1].xyzw
mad r26.xyzw, r30.xxxx, cb1[0].xyzw, r26.xyzw
mad r26.xyzw, r30.zzzz, cb1[2].xyzw, r26.xyzw
add r26.xyzw, r26.xyzw, cb1[3].xyzw
//Fix Clip
add r25.w, r26.w, -r25.y
mul r25.w, r25.x, r25.w
add r26.x, r26.x, -r25.w
//Translate r26.xyz (Clip) to world cb1[32,33,34,35]
mul r30.xyzw, r26.yyyy, cb1[33].xyzw
mad r30.xyzw, r26.xxxx, cb1[32].xyzw, r30.xyzw
mad r30.xyzw, r26.zzzz, cb1[34].xyzw, r30.xyzw
mad r30.xyzw, r26.wwww, cb1[35].xyzw, r30.xyzw

// Update translated coordinate
mov r1.yzw, r30.xyzz

add r2.xyz, r1.yzwy, -cb0[56].xyzx
add r3.xyz, v4.xyzx, -cb0[56].xyzx
dp3 r2.w, -r1.yzwy, -r1.yzwy
rsq r2.w, r2.w
mul r1.yzw, -r1.yyzw, r2.wwww
div r4.xyzw, r2.xyyz, cb5[6].xxxx
sample_indexable(texture2d)(float,float,float,float) r5.xy, r4.xyxx, t2.xyzw, s1
mad r5.xy, r5.xyxx, l(2.000000, 2.000000, 0.000000, 0.000000), v1.xyxx
add r5.xy, r5.xyxx, l(-1.000000, -1.000000, 0.000000, 0.000000)
sample_indexable(texture2d)(float,float,float,float) r4.zw, r4.zwzz, t2.zwxy, s1
mad r4.zw, r4.zzzw, l(0.000000, 0.000000, 2.000000, 2.000000), v1.yyyz
add r6.yz, r4.zzwz, l(0.000000, -1.000000, -1.000000, 0.000000)
div r7.xyzw, r2.xzxy, cb5[6].xxww
sample_indexable(texture2d)(float,float,float,float) r4.zw, r7.xyxx, t2.zwxy, s1
mad r4.zw, r4.zzzw, l(0.000000, 0.000000, 2.000000, 2.000000), v1.xxxz
add r8.xz, r4.zzwz, l(-1.000000, 0.000000, -1.000000, 0.000000)
mul r4.zw, |v1.xxxz|, cb5[6].yyyy
max r4.zw, |r4.zzzw|, l(0.000000, 0.000000, 0.000001, 0.000001)
log r4.zw, r4.zzzw
mul r4.zw, r4.zzzw, cb5[6].zzzz
exp r4.zw, r4.zzzw
add r4.zw, -r4.zzzw, l(0.000000, 0.000000, 1.000000, 1.000000)
max r4.zw, r4.zzzw, l(0.000000, 0.000000, 0.000000, 0.000000)
mov r6.x, v1.x
mov r8.y, v1.y
add r8.xyz, -r6.xyzx, r8.xyzx
mad r6.xyz, r4.zzzz, r8.xyzx, r6.xyzx
mov r5.z, v1.z
add r6.xyz, -r5.xyzx, r6.xyzx
mad r5.xyz, r4.wwww, r6.xyzx, r5.xyzx
dp3 r6.x, v0.xyzx, r5.xyzx
dp3 r6.y, r0.zxyz, r5.xyzx
dp3 r6.z, v1.xyzx, r5.xyzx
sample_indexable(texture2d)(float,float,float,float) r5.xy, r7.zwzz, t3.xyzw, s2
mad r5.xy, r5.xyxx, l(2.000000, 2.000000, 0.000000, 0.000000), v1.xyxx
add r5.xy, r5.xyxx, l(-1.000000, -1.000000, 0.000000, 0.000000)
div r8.xyzw, r2.yzxz, cb5[6].wwww
sample_indexable(texture2d)(float,float,float,float) r7.xy, r8.xyxx, t3.xyzw, s2
mad r7.xy, r7.xyxx, l(2.000000, 2.000000, 0.000000, 0.000000), v1.yzyy
add r9.yz, r7.xxyx, l(0.000000, -1.000000, -1.000000, 0.000000)
sample_indexable(texture2d)(float,float,float,float) r7.xy, r8.zwzz, t3.xyzw, s2
mad r7.xy, r7.xyxx, l(2.000000, 2.000000, 0.000000, 0.000000), v1.xzxx
add r8.xz, r7.xxyx, l(-1.000000, 0.000000, -1.000000, 0.000000)
mul r7.xy, |v1.xzxx|, cb5[7].xxxx
max r7.xy, |r7.xyxx|, l(0.000001, 0.000001, 0.000000, 0.000000)
log r7.xy, r7.xyxx
mul r7.xy, r7.xyxx, cb5[7].yyyy
exp r7.xy, r7.xyxx
add r7.xy, -r7.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)
max r7.xy, r7.xyxx, l(0.000000, 0.000000, 0.000000, 0.000000)
mov r9.x, v1.x
mov r8.y, v1.y
add r8.xyz, -r9.xyzx, r8.xyzx
mad r8.xyz, r7.xxxx, r8.xyzx, r9.xyzx
mov r5.z, v1.z
add r8.xyz, -r5.xyzx, r8.xyzx
mad r5.xyz, r7.yyyy, r8.xyzx, r5.xyzx
dp3 r8.x, v0.xyzx, r5.xyzx
dp3 r8.y, r0.zxyz, r5.xyzx
dp3 r8.z, v1.xyzx, r5.xyzx
sample_indexable(texture2d)(float,float,float,float) r5.xyz, r7.zwzz, t4.xyzw, s3
div r9.xyz, r3.xyzx, cb5[6].wwww
sample_indexable(texture2d)(float,float,float,float) r10.xyz, r9.yzyy, t4.xyzw, s3
sample_indexable(texture2d)(float,float,float,float) r11.xyz, r9.xzxx, t4.xyzw, s3
add r11.xyz, -r10.yxzy, r11.yxzy
mad r10.xyz, r7.xxxx, r11.xyzx, r10.yxzy
add r10.xyz, -r5.yxzy, r10.xyzx
mad r5.xyz, r7.yyyy, r10.xyzx, r5.yxzy
max r2.w, |v2.x|, l(0.000001)
log r2.w, r2.w
mul r2.w, r2.w, cb5[8].y
exp r2.w, r2.w
add r3.w, -cb5[8].w, cb5[8].z
mad r2.w, r2.w, r3.w, cb5[8].w
add r3.w, v1.z, l(1.000000)
mad r3.w, -r3.w, l(0.500000), l(1.000000)
mad_sat r3.w, -cb5[9].y, r3.w, l(1.000000)
max r3.w, r3.w, l(0.000001)
log r3.w, r3.w
mul r3.w, r3.w, cb5[9].z
exp r3.w, r3.w
add r5.w, cb5[9].w, -cb5[10].x
mad r3.w, r3.w, r5.w, cb5[10].x
mul r2.w, r2.w, r3.w
add r3.w, |r2.y|, -cb5[10].y
max r3.w, r3.w, l(0.000000)
div_sat r3.w, r3.w, cb5[10].z
max r3.w, r3.w, l(0.000001)
log r3.w, r3.w
mul r3.w, r3.w, cb5[10].w
exp r3.w, r3.w
add r5.w, -cb5[11].y, cb5[11].x
mad r3.w, r3.w, r5.w, cb5[11].y
mul_sat r2.w, r2.w, r3.w
add r2.w, -r2.w, l(1.000000)
mad r3.w, r5.x, cb5[8].x, -r2.w
add r3.w, r3.w, l(2.000000)
add r2.w, r2.w, r2.w
sample_indexable(texture2d)(float,float,float,float) r10.xyz, r4.xyxx, t5.xyzw, s4
div r3.xyz, r3.xyzx, cb5[6].xxxx
sample_indexable(texture2d)(float,float,float,float) r11.xyz, r3.yzyy, t5.xyzw, s4
sample_indexable(texture2d)(float,float,float,float) r12.xyz, r3.xzxx, t5.xyzw, s4
add r12.xyz, -r11.yxzy, r12.yxzy
mad r11.xyz, r4.zzzz, r12.xyzx, r11.yxzy
add r11.xyz, -r10.yxzy, r11.xyzx
mad r10.xyz, r4.wwww, r11.xyzx, r10.yxzy
mad r2.w, r10.x, cb5[11].z, r2.w
add_sat r2.w, -r2.w, r3.w
add r3.w, -cb5[2].x, cb5[11].w
mad_sat r2.w, r2.w, r3.w, cb5[2].x
add r8.xyz, -r6.xyzx, r8.xyzx
mad r6.xyz, r2.wwww, r8.xyzx, r6.xyzx
mad r6.xyz, r6.xyzx, cb1[7].wwww, cb1[7].xyzx
dp3 r3.w, r6.xyzx, r6.xyzx
rsq r3.w, r3.w
mul r6.xyz, r3.wwww, r6.xyzx
mul r0.xyz, r0.xyzx, r6.yyyy
mad r0.xyz, r6.xxxx, v0.yzxy, r0.xyzx
mad r0.xyz, r6.zzzz, v1.yzxy, r0.xyzx
dp3 r3.w, r0.xyzx, r0.xyzx
rsq r3.w, r3.w
mul r6.xyz, r0.xyzx, r3.wwww
sample_indexable(texture2d)(float,float,float,float) r8.xyzw, r4.xyxx, t6.xyzw, s5
sample_indexable(texture2d)(float,float,float,float) r11.xyzw, r3.yzyy, t6.xyzw, s5
sample_indexable(texture2d)(float,float,float,float) r12.xyzw, r3.xzxx, t6.xyzw, s5
add r12.xyzw, -r11.xyzw, r12.xyzw
mad r11.xyzw, r4.zzzz, r12.xyzw, r11.xyzw
add r11.xyzw, -r8.xyzw, r11.xyzw
mad r4.xyzw, r4.wwww, r11.xyzw, r8.xyzw
mul r3.xyz, r4.xyzx, cb5[12].xxxx
dp3 r5.w, r6.zxyz, r1.yzwy
max r5.w, r5.w, l(0.000000)
add r5.w, -r5.w, l(1.000000)
max r5.w, |r5.w|, l(0.000001)
log r5.w, r5.w
mul r5.w, r5.w, cb5[12].y
exp r5.w, r5.w
mad r5.w, r5.w, l(0.960000), l(0.040000)
add r8.x, |r2.y|, -cb5[12].z
max r8.x, r8.x, l(0.000000)
div_sat r8.x, r8.x, cb5[12].w
max r8.x, r8.x, l(0.000001)
log r8.x, r8.x
mul r8.x, r8.x, cb5[13].x
exp r8.x, r8.x
add r8.x, -r8.x, l(1.000000)
mul r5.w, r5.w, r8.x
mul r8.xyz, r3.xyzx, r5.wwww
mul r8.xyz, r8.xyzx, cb5[13].yyyy
sample_indexable(texture2d)(float,float,float,float) r11.xyzw, r7.zwzz, t7.xyzw, s6
sample_indexable(texture2d)(float,float,float,float) r12.xyzw, r9.yzyy, t7.xyzw, s6
sample_indexable(texture2d)(float,float,float,float) r9.xyzw, r9.xzxx, t7.xyzw, s6
add r9.xyzw, -r12.xyzw, r9.xyzw
mad r9.xyzw, r7.xxxx, r9.xyzw, r12.xyzw
add r9.xyzw, -r11.xyzw, r9.xyzw
mad r7.xyzw, r7.yyyy, r9.xyzw, r11.xyzw
mul r9.xyz, r7.xyzx, cb5[12].xxxx
mul r11.xyz, r5.wwww, r9.xyzx
mad r11.xyz, r11.xyzx, cb5[13].yyyy, -r8.xyzx
mad r8.xyz, r2.wwww, r11.xyzx, r8.xyzx
add r8.xyz, r8.xyzx, cb5[3].xyzx
mad r3.xyz, r3.xyzx, r5.wwww, r4.xyzx
mad r4.xyz, r9.xyzx, r5.wwww, r7.xyzx
add r4.xyz, -r3.xyzx, r4.xyzx
mad r3.xyz, r2.wwww, r4.xyzx, r3.xyzx
mul_sat r3.xyz, r3.xyzx, cb5[4].xyzx
mul r4.x, r10.x, cb5[13].z
mad r4.y, r5.x, cb5[13].w, -r4.x
mad_sat r4.x, r2.w, r4.y, r4.x
add r7.xy, -cb5[14].ywyy, cb5[14].xzxx
mad r4.z, r10.y, r7.x, cb5[14].y
mad r5.y, r5.y, r7.y, cb5[14].w
add r5.y, -r4.z, r5.y
mad_sat r4.z, r2.w, r5.y, r4.z
add r5.y, -r4.w, r7.w
mad r4.w, r2.w, r5.y, r4.w
add r5.y, cb4[2].x, cb5[15].x
add r5.y, r5.y, l(1000000.000000)
add r5.w, r2.y, -r5.y
lt r5.w, l(0.000010), |r5.w|
ge r5.y, r2.y, r5.y
movc r5.y, r5.y, l(0), l(1.000000)
and r5.y, r5.y, r5.w
add r5.w, cb4[2].z, l(-0.500000)
lt r5.w, l(0.000010), |r5.w|
ge r7.x, cb4[2].z, l(0.500000)
movc r5.y, r7.x, l(1.000000), r5.y
movc r5.y, r5.w, r5.y, l(1.000000)
add r5.w, -cb5[15].z, cb5[15].y
mad_sat r5.w, r10.z, r5.w, cb5[15].z
add r7.x, cb5[15].w, -cb5[16].x
mad_sat r5.z, r5.z, r7.x, cb5[16].x
add r5.z, -r5.w, r5.z
mad_sat r5.z, r2.w, r5.z, r5.w
add r5.w, -r10.x, l(1.000000)
add r7.x, -cb5[16].z, cb5[16].y
mad r5.w, r5.w, r7.x, cb5[16].z
add r5.x, -r5.x, l(1.000000)
add r7.x, cb5[16].w, -cb5[17].x
mad r5.x, r5.x, r7.x, cb5[17].x
add r5.x, -r5.w, r5.x
mad r2.w, r2.w, r5.x, r5.w
max r2.w, r2.w, l(0.000000)
mad r1.x, r1.x, v5.w, r2.w
div oDepthLE, r0.w, r1.x
mad r0.w, r4.w, r5.y, l(-0.333300)
lt r0.w, r0.w, l(0.000000)
discard_nz r0.w
mov_sat r4.y, cb4[0].z
mad o2.z, r4.z, cb1[8].y, cb1[8].x
mad r5.xyw, -r3.xyxz, r4.xxxx, r3.xyxz
mul r0.w, r4.y, l(0.080000)
mad r7.xyz, -r4.yyyy, l(0.080000, 0.080000, 0.080000, 0.000000), r3.xyzx
mad r7.xyz, r4.xxxx, r7.xyzx, r0.wwww
mad r5.xyw, r5.xyxw, cb1[5].wwww, cb1[5].xyxz
mad r7.xyz, r7.xyzx, cb1[6].wwww, cb1[6].xyzx
mul r4.zw, v3.xxxy, l(0.000000, 0.000000, 1.000000, 0.500000)
mad r9.xy, v3.xyxx, l(1.000000, 0.500000, 0.000000, 0.000000), l(0.000000, 0.500000, 0.000000, 0.000000)
sample_indexable(texture2d)(float,float,float,float) r10.xyzw, r4.zwzz, t0.xyzw, s0
sample_indexable(texture2d)(float,float,float,float) r9.xyzw, r9.xyxx, t0.xyzw, s0
mad r0.w, r9.w, l(0.00392156886), r10.w
add r0.w, r0.w, l(-0.00196078443)
mad r0.w, r0.w, cb3[18].w, cb3[20].w
mul r10.xyz, r10.xyzx, r10.xyzx
mad r10.xyz, r10.xyzx, cb3[18].xyzx, cb3[20].xyzx
exp r0.w, r0.w
add r0.w, r0.w, l(-0.0185813606)
mad r9.xyzw, r9.xyzw, cb3[19].xyzw, cb3[21].xyzw
mov r6.w, l(1.000000)
dp4 r1.x, r9.xyzw, r6.xyzw
max r1.x, r1.x, l(0.000000)
mul r0.w, r0.w, r1.x
mul r9.xyz, r10.xyzx, r0.wwww
mul r9.xyz, r9.xyzx, cb1[37].xyzx
lt r0.w, l(0.000000), cb1[45].x
if_nz r0.w
  sample_indexable(texture2d)(float,float,float,float) r10.xyzw, v3.xyxx, t1.xyzw, s0
  mad r10.xyz, r10.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), l(-1.000000, -1.000000, -1.000000, 0.000000)
  mul r11.x, r10.w, r10.w
  dp3 r0.w, r10.xyzx, r10.xyzx
  rsq r0.w, r0.w
  mul r10.xyz, r0.wwww, r10.xyzx
  mad r0.w, -r10.w, r10.w, l(1.000000)
  mad r0.w, -r0.w, r0.w, l(1.000000)
  mad r0.xyz, r0.zxyz, r3.wwww, -r10.xyzx
  mad r12.xyz, r0.wwww, r0.xyzx, r10.xyzx
  dp3_sat r0.x, r10.yzxy, r6.xyzx
  add r0.y, -r0.x, l(1.000000)
  mad r11.y, r0.w, r0.y, r0.x
else 
  mov r12.xyz, r6.zxyz
  mov r11.xy, l(1.000000,1.000000,0,0)
endif 
lt r0.x, l(0.000000), cb1[12].w
if_nz r0.x
  mov r12.w, l(1.000000)
  dp4 r0.x, cb1[48].xyzw, r12.xyzw
  dp4 r0.y, cb1[49].xyzw, r12.xyzw
  dp4 r0.z, cb1[50].xyzw, r12.xyzw
  mul r10.xyzw, r12.yzzx, r12.xyzz
  dp4 r13.x, cb1[51].xyzw, r10.xyzw
  dp4 r13.y, cb1[52].xyzw, r10.xyzw
  dp4 r13.z, cb1[53].xyzw, r10.xyzw
  mul r0.w, r12.y, r12.y
  mad r0.w, r12.x, r12.x, -r0.w
  add r0.xyz, r0.xyzx, r13.xyzx
  mad r0.xyz, cb1[54].xyzx, r0.wwww, r0.xyzx
  max r0.xyz, r0.xyzx, l(0.000000, 0.000000, 0.000000, 0.000000)
  mul r0.xyz, r0.xyzx, cb1[47].xyzx
  mul r0.w, r11.y, r11.x
  mad r9.xyz, r0.xyzx, r0.wwww, r9.xyzx
endif 
dp3 r0.x, r9.xyzx, l(0.300000, 0.590000, 0.110000, 0.000000)
mul r0.yzw, r5.xxyw, r9.xxyz
mul r9.xyz, r5.zzzz, r0.yzwy
mad r5.xyw, r7.xyxz, l(0.450000, 0.450000, 0.000000, 0.450000), r5.xyxw
mad r0.yzw, -r0.yyzw, r5.zzzz, r5.xxyw
mad r0.yzw, cb1[13].xxxx, r0.yyzw, r9.xxyz
max r5.xyw, r8.xyxz, l(0.000000, 0.000000, 0.000000, 0.000000)
lt r1.x, l(0.000000), cb1[9].x
if_nz r1.x
  mad r1.xyz, r1.yzwy, r2.wwww, r2.xyzx
  add r2.xyz, r1.xyzx, -cb2[8].xyzx
  add r7.xyz, cb2[9].xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)
  lt r2.xyz, r7.xyzx, |r2.xyzx|
  or r1.w, r2.y, r2.x
  or r1.w, r2.z, r1.w
  dp3 r1.x, r1.xyzx, l(0.577000, 0.577000, 0.577000, 0.000000)
  mul r1.x, r1.x, l(0.002000)
  frc r1.x, r1.x
  lt r1.x, l(0.500000), r1.x
  movc r1.xyz, r1.xxxx, l(0,1.000000,1.000000,0), l(1.000000,1.000000,0,0)
  movc r5.xyw, r1.wwww, r1.xyxz, r5.xyxw
endif 
add o0.xyz, r0.yzwy, r5.xywx
mul r0.yzw, v5.xxyx, l(0.000000, 0.00781250000, 0.00781250000, 0.00781250000)
frc r0.yzw, r0.yyzw
mad r0.yzw, r0.yyzw, l(0.000000, 128.000000, 128.000000, 128.000000), l(0.000000, -64.340622, -72.465622, -64.340622)
mul r0.yzw, r0.yyzz, r0.yyzw
dp3 r0.y, r0.yzwy, l(20.390625, 60.703125, 2.42812085, 0.000000)
frc r0.y, r0.y
add r0.y, r0.y, l(-0.500000)
mad o1.xyz, r6.zxyz, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000)
mad r0.x, r0.x, r5.z, l(0.00390625000)
log r0.x, r0.x
mad r0.x, r0.x, l(0.062500), l(0.500000)
mad o3.w, r0.y, l(0.00392156886), r0.x
mov o0.w, l(0)
mov o1.w, cb2[11].x
mov o2.w, l(0.756862760)
mov o2.xy, r4.xyxx
mov o3.xyz, r3.xyzx
mov o4.xyzw, l(0,0,0,0)
mov o5.xyzw, l(0,0,0,0)
ret 
// Approximately 0 instruction slots used

The ASM shader uses a .yzw swizzle.

That's not a valid swizzle in DX11 (it is a valid mask) - valid swizzles have either exactly one character (if all swizzles and masks have exactly one character) or exactly four characters. The correct swizzle in this case would be .yzwx (.yzwy, .yzwz and .yzww would also work - the later would replicate DX9 assembly behaviour).

Any two or three character swizzle seems to be treated as though it was .xyzw by DirectX / nvidia, no matter what swizzle was actually used (this is why .xyz swizzles seem to work, but they are technically wrong as well).

This is annoying, but I don't think it's technically an assembler bug. That's not to say we shouldn't do something about it - we could either have the assembler fail to assemble these, issue a warning or silently patch them to what we think the user meant. We need to be careful of breaking existing fixes that may be inadvertently relying on this weird behaviour.

I should rephrase that- the ASM shader uses a .yzw mask as the destination. The salient piece of code is:

...
div r1.yzw, r2.xxyz, r2.wwww   // original shader instruction.

// World coordinate to fix register
mov r30.xyzw, r1.yzww

//Fix code... uses only r30, r26. 
<snip>

// Update translated coordinate
mov r1.yzw, r30.xyzz

add r2.xyz, r1.yzwy, -cb0[56].xyzx   // original shader instruction
...

Are these r30 source swizzles wrong?

Edit: Ah, seems like it ought to be mov r1.yzw, r30.xxyz I'll have DHR try that.

I should rephrase that- the ASM shader uses a .yzw mask as the destination.

Ah, ok - yeah, that's a mask, not a swizzle.

Edit: Ah, seems like it ought to be mov r1.yzw, r30.xxyz I'll have DHR try that.

Yep, that's it - the .yzw mask means the first character in the swizzle will effectively be ignored.

Ah, geez! I'm having flashbacks of trying to get HelixMod to do something, and getting zero results. Hated that part of HelixModding. F*n' assembly language. That's 15 hours of debugging I'll not get back. I'll just have to accept that it was more education from the school of hard knocks.

DHR tried the latest version switching that instruction, and using the cb0 register for this shader (wrong above), and confirmed that it works.

Edit: Added to the wiki.bo3b.net Gotchas list.

The test suite I've been adding located an issue with the assembler, where mixed type resource declarations would produce a corrupt binary and cause a hang - the fix for this is now in master and will be part of 3DMigoto 1.3.12:

dcl_resource_buffer (mixed,mixed,mixed,mixed) t114

Adding another issue to the list: Incorrect capitalisation on SPRs causes a crash (probably due to a bad shader binary). I hit this by trying to use "vThreadIdInGroupFlattened.x" instead of "vThreadIDInGroupFlattened.x"

I also noticed another issue with minimum precision registers. This should be considered minor as these are not supported before Windows 8.0 and it is unlikely we will encounter them in the wild for some time.

Resident Evil 2 is the first known game to use minimum precision types, though it's not clear to me what they do on Windows 7 where minimum precision is not supported - perhaps it uses OS specific shaders?

Support for minimum precision types is now in master, however I notice that shaders that use these also have an extra "SFI0" section ("SubtargetFeatureInfo?") that we omit - I don't yet know how serious that is. Additionally, the signature parser generates sections whos versions don't match the original shader - presumably that needs to detect this situation and upgrade it's signature sections to allow them to contain the minimum precision types.

Double precision support is now in master (RE2 uses this as well), including precision fixup.

SFI0 is now implemented, and the rules for upgrading signature versions when minimum precision is in use has been adjusted to match fxc

I've pushed up support for detecting certain types of parse errors: unrecognised instructions, operands, and too few/many operands passed to an instruction. Parse errors go to the overlay / cmd_Decompiler's stderr and include a short description, the failing line number and the string that it failed to parse.

There's probably more situations where we should throw parse errors (e.g. extra junk at the end of an operand, omitted register type/index, etc), but this lower hanging fruit already covers some of the biggest pain points we've had with the assembler.

bo3b / 3Dmigoto

Various Assembler Bugs #36