bo3b / 3Dmigoto

Chiri's DX11 wrapper to enable fixing broken stereoscopic effects.
Other
749 stars 116 forks source link

Refactoring to be done #78

Open bo3b opened 6 years ago

bo3b commented 6 years ago

I want to start a list of some things we'd like to refactor, that we can't really justify doing piecemeal, because of the impact on too many files. When we refactor some things, we touch so many files, and so many lines of code that we essentially invalidate any prior diff style comparisons by generating too much line-noise. But if we never refactor, things keep getting cruftier and cruftier.

I expect to be able to do a big workover sometime late this year. The idea will be to make a refactoring branch that will be used for all the changes, and when the time is right becomes a new 1.3.x variant. Working in a branch will avoid impacting WIP, and allow us to choose a 'least-bad' time to switch. I won't start working on this until there is a 'lull' in feature improvements, because merging while refactoring is tedious.

I'm just writing these down as a way to remember and keep some notes. Please feel free to note more in the discussion, and I'll add them to this list.

List of desirables

  1. Add logging library (most likely g4log) , and refactor all our DebugLog/DebugInfo calls.
    This is interesting because it will remove the constant pain of \n line endings, allow more levels, and give us a lot more flexibility for log output and types.

  2. ~~Move out all game fixes from 3Dmigoto to a new repo. There are a lot of game fix folders in the project, from when the idea was that we would ship game fixes directly out of 3Dmigoto. Even though they are still working fixes, these make no real sense as part of the 3Dmigoto project, and it would dramatically clean up the cruft.~~

  3. Unify variable formats especially in globals.h, but also elsewhere.
    This is worth doing because our current usage is a random mish-mash of ALL_CAPS, and CamelCase, and under_bar, and gInitial_Var styles. It would make sense to decide upon a strict format for future use.

  4. Complete the move of FrameAnalysis to become a subclass of HackerContext. The reason this is worth considering is because performance analysis always shows FrameAnalysisLog as using measurable CPU on the level of 0.2% to 0.3% of CPU. It would be nice to not pay that price unless FrameAnalysis is actually enabled in d3dx.ini, and create a specific FrameAnalysisContext object.

  5. Change any 'orig', or 'real', or 'new', or 'old' naming to something more clear and specific. These are interesting because it's high bang-for-the-buck to clarify variable names.

  6. nullptr checks on C++ object creation Exception policy? As noted, these are useless and actually misleading, because any C++ instantiation that fails will throw an exception. As part of this, it would be worth considering or codifying our exception policy.

  7. Since we are C++, just use std::string. We've got a lot of tweaky variants, passing around pointers, and wide strings, and wchar and char and probably every variant possible. It would be a nice-to-have to just use std::string everywhere, and do last second conversions as needed.

  8. Switch our pointer use to ComPtr. This is another nice-to-have. Would be moderately risky in terms of introducing bugs, but would give us a dramatic bump in readability and ensure we are not leaking memory.

  9. Update both Nektra deviare library to latest version. Update to latest version of DirectXTK. It might make sense to use git submodules for these.

bo3b commented 6 years ago

For 3, we should collectively decide what we think is best going forward.

Specific choices:
  1. Constants, and only constants, are to be in ALL_CAPS_UNDER_BAR style.

For me, I have a medium preference for CamelCase style, because I more accustomed to it from years of Mac programming. It's also worth noting that it is the Microsoft API convention, and since we are stuck in Microsoft land, it makes some sense to use their conventions.

We do know that all else being equal, the lowercase_under_bar style is easier to read.


There is possibly some room for variance. It might make sense to use under_bar style for any of our defined variables, as a way to call out visually that they are ours, as distinct from parameters passed in to subroutines, or wrapped object parameters from DX11 calls.

I think it's fairly clear that we should at least avoid using under_bar style for parameters either from or to Windows API calls, as that then mixes style in a single call.

It probably makes sense to use under_bar style for anything that we build completely, like the IniHandling, or global states that we store as part of that.


Please feel free to weigh in with your preferences and suggestions.

bo3b commented 6 years ago

For #7, this is definitely nice-to-have, but maybe not worth the effort. It's also worth noting that unlike the other refactorings, this type of change would be prone to introducing bugs, and is quite a bit more risky than the others.

DarkStarSword commented 6 years ago

For 4 we should look into the possible gains of using inlined versions of the logging calls rather than making C++ OO spaghetti code with a new class (I've been hacking on UE4, don't talk to me about how "great" OO is ;-) - all those calls return immediately if frame analysis is not enabled, so if we're seeing any measurable cost it is most likely down to the function call and associated branch prediction and instruction cache effects on the CPU.

We should also decouple all the code in FrameAnalysis.cpp from the context - it currently makes use of some of the private members of HackerContext, but with a bit of work it should be able to operate entirely on a regular ID3D11DeviceContext. I wouldn't make this part a class though - I'd rather see it as a bunch of static functions contained in one file, only exposing the high level calls that the context needs to call.

DarkStarSword commented 6 years ago

2 - yes, I'd like to get rid of those. Happy to move FC4, Witcher3 and Batman into my own repos since I worked on those, but not sure where to put the others - maybe we could just create one repository for all the remaining ones? I know enough git foo to recreate these in a separate repo with history preserved.

Some of those game directories have example shaders in them - those should probably be moved into something like TestShaders/GameExamples/xxx/ to clean up our root a little.

DarkStarSword commented 6 years ago

7 - I'd like to eliminate a lot of our wchar_t / wstring usage really - there's very little code that needs to deal with anything other than ASCII (maybe filesystem paths), and if any unicode characters do manage to get into 3DMigoto I would be very surprised if we handle them correctly.

I'd feel better about switching to std::string if it provided decent case insensitive matching and format conversions - these are too clunky in standard C++ IMO, and where they have got concessions to do them you can tell that they haven't bothered thinking about how people will actually want to use them (I notice that UE4 implements their own string class with surprise surprise - a printf method). I have been playing around a little with this more lately though, so I reserve the right to change my mind.

bo3b commented 6 years ago

For 4 we should look into the possible gains of using inlined versions of the logging calls rather than making C++ OO spaghetti code with a new class (I've been hacking on UE4, don't talk to me about how "great" OO is ;-) - all those calls return immediately if frame analysis is not enabled, so if we're seeing any measurable cost it is most likely down to the function call and associated branch prediction and instruction cache effects on the CPU.

We'll have to decide at some point whether we want to be OO or not. Our current setup is a hybrid, and it's not very good.

In my experience, I always get much worse spaghetti code from not using OO approaches, but it is certainly possible to blow the design and make a stupid or poorly thought out set of objects. I haven't looked at UE4, but I have seen that there are a lot of programmers who have no idea how to do OO design. Even at the level of has-a, and is-a object thought.

In our case, as an example, we have that lame global object that pretends to be an object, but has nothing whatsoever to do with OO programming. G-> this that and the other is not proper encapsulation of course.


In this FA case, I think that a subclass makes sense and is a clean design, because a FA_Context does everything a Context does, but a little more. We also get real C++, not the phony DX11 variant, so we can use the superclass properly instead of pointers. This doesn't require cross sharing of objects and functionality, or anything where it goes off the rails, like multiple inheritance.

The flip side is that we presently have little bits of FA sprinkled all throughout our current Context, which distracts from the job it is trying to do, even in the 99% of the time that we are not running FA. This is the essence of an OO abstraction, removing the distractions. But at the cost of some complexity. Proper OO can remove a lot of 'if's.


If we use OO for real, instead of global structures, we can remove nearly all of our globals, that would then be attached to specific objects. So for example, a HackerContext includes all of the fields it needs, and does not fetch G-> for any reason, because presumably it was created properly.

That would then allow us to remove the use of key:value lookups altogether, because a specific instance of something like a ShaderObject would already have our override version of the code, and not need a lookup based on hashes. And I think this would clarify the code because it's a logical OO override, to enhance the functionality of a base Shader into one that is smarter for our purposes. Just my opinion, but I think this style of override would be better than our key:value lookup approach with the attendant management.


On the other hand, we can skip OO altogether, and just do straight C. C++ is a bit of a whack-job, and mostly has caused things to go off the rails like we've got.

I'm not a huge fan of C either, because it puts way too much burden on the programmer for details that really ought to be managed by the language. But having a consistent goal, either OO or not at all would be superior to what we've got.


Q: Do you use an IDE? VisualStudio is a bit weird and weaker than a lot of other IDEs, but works to remove the problem of object hopping for stuff like superclass code. If you only ever use vi, vim or emacs, then OO becomes a real burden because the editor doesn't provide good support.


I'll probably go ahead and dummy up an object approach for this in a refactoring branch, and have you take a look to see if it seems it OK. That would also allow me to better see what the end result might look like.

bo3b commented 6 years ago

I'd feel better about switching to std::string if it provided decent case insensitive matching and format conversions - these are too clunky in standard C++ IMO, and where they have got concessions to do them you can tell that they haven't bothered thinking about how people will actually want to use them (I notice that UE4 implements their own string class with surprise surprise - a printf method). I have been playing around a little with this more lately though, so I reserve the right to change my mind.

Yes, I'm also of at least two or three minds on these.

I'll defer to your opinion of std::string, I haven't used it enough to see, and just assumed (!) that it would be well thought out. Sigh. It seems like Boost is the real version of everything and the standard libraries are a hack job.

I was thinking that we'd want to use wchar because that's what the OS uses internally for everything. And we are at least partly tied to that because of API calls that we make.

My real goal here is to avoid our incessant string conversions. Might not be possible, but every time I go to use an API it seems like I've got the wrong format on hand.

bo3b commented 6 years ago

2 - yes, I'd like to get rid of those. Happy to move FC4, Witcher3 and Batman into my own repos since I worked on those, but not sure where to put the others - maybe we could just create one repository for all the remaining ones? I know enough git foo to recreate these in a separate repo with history preserved.

Some of those game directories have example shaders in them - those should probably be moved into something like TestShaders/GameExamples/xxx/ to clean up our root a little.

That sounds good to me. If you'd prefer to 'own' them, please go ahead and move FC4, Witcher3, and Batman to your repo at your convenience. At a minimum, FC4 is really your showcase.

My plan was to move all of them to a new 3d-fix repo in my account, that I'd provide the same access rights to everyone working on them. mike_ar69, DHR, and Helifax have access, but typically don't use it, so maybe that doesn't matter.

The random broken shaders in the game fixes would be good in a TestShaders folder. I use those as examples of bugs and good test cases for the Decompiler. Bit of a mess though, I haven't cleaned up ones that have since been fixed.

DarkStarSword commented 6 years ago

Q: Do you use an IDE? VisualStudio is a bit weird and weaker than a lot of other IDEs, but works to remove the problem of object hopping for stuff like superclass code. If you only ever use vi, vim or emacs, then OO becomes a real burden because the editor doesn't provide good support.

No, I don't use an IDE, and yeah, that's exactly what I mean by OO spaghetti code - large chunks of what the code is doing isn't merely packed off in a helper routine with a nice descriptive name - they are completely hidden from view, and the more levels of OO that gets added the harder it becomes to follow.

Just for frame analysis logging it might make some sense, but at that point we already will have 5 permutations of the context object alone:

We kept the design of HookedContext fairly clean - it's functionality is logically separate enough from everything else the context does that it doesn't add significantly to the spaghetti complexity factor, but while the frame analysis logging seems similar at first glance, it is not quite so cleanly separated - e.g. sometimes the logging needs to run before HackerContext (set calls), sometimes it needs to run after HackerContext (get calls), which are easy enough to do with OO, but sometimes it needs to run in the middle of HackerContext (between BeforeDraw and AfterDraw) for the log to make any sense once the Command List actions are added in, which we can't easily do if it is separate from HackerContext. Maybe it could go between HackerContext and ID3D11DeviceContext, but now we are designing OO inheritance to describe sequential logic, which is how we get to OO spaghetti code.

We would also want to be careful that we don't lose the ability to turn frame analysis on at runtime - we could mostly avoid that by tying it into the hunting toggle - it would mean a game restart is required to turn on frame analysis if hunting was set to 0 on launch, but at least wouldn't need a restart to go from hunting=2, frame_analysis off to hunting=1, frame_analysis on.

I'll defer to your opinion of std::string, I haven't used it enough to see, and just assumed (!) that it would be well thought out. Sigh. It seems like Boost is the real version of everything and the standard libraries are a hack job.

Yeah, I see Boost mentioned a lot - in a way I'd rather avoid bringing in unnecessary dependencies, but it sounds like this one might be worthwhile. Having never used it (and not necessarily trusting the opinions of devoted C++ fanatics) I can't really be sure. Have you looked into it in any detail?

C++ is a bit of a whack-job, and mostly has caused things to go off the rails like we've got.

Well said ;-)

I'm not a huge fan of C either, because it puts way too much burden on the programmer for details that really ought to be managed by the language. But having a consistent goal, either OO or not at all would be superior to what we've got.

Yeah, I agree that C can get quite verbose - The Linux Kernel is pretty good to work with because they already have so much infrastructure built in already that you get away from a lot of those details, but without that there is an awful lot of wheel reinvention (free plug for Rusty's ccan which helps a lot: https://ccodearchive.net/list.html ). C has it's place (and I do not think we would be well served by switching to pure C in 3DMigoto), and high level languages like Python have their place, but C++ has always felt like a language that doesn't quite know where to fit in - it tries to provide some higher level constructs, but just doesn't quite get it right and in some cases makes things harder than they were pure C. Despite supposedly being a superset of C it is still quite separated from it, and using any C++ conveniences tends to start pulling in more and more C++ and restrict what C conveniences are left.

bo3b commented 6 years ago

Q: Do you use an IDE? VisualStudio is a bit weird and weaker than a lot of other IDEs, but works to remove the problem of object hopping for stuff like superclass code. If you only ever use vi, vim or emacs, then OO becomes a real burden because the editor doesn't provide good support.

No, I don't use an IDE, and yeah, that's exactly what I mean by OO spaghetti code - large chunks of what the code is doing isn't merely packed off in a helper routine with a nice descriptive name - they are completely hidden from view, and the more levels of OO that gets added the harder it becomes to follow.

Yeah, without support from your editor or tool environment, OO can become very abstract. This is the essence of OO programming however, that you deliberately make some things more abstract in order to keep the specific pieces simpler. If it's done well, I think it makes a huge difference in terms of managing complexity. Mostly however it's not done well. :->

The way I try to structure objects is like an API, where you have a clear contract for any given method. So a method should not just be some random function, it should actually be an object-level API that has a more strict and clear contract. That reduces the need to keep driving through hierarchy, once you trust your underlying objects.


For VisualStudio, the navigation is made much simpler by them using F12 for the GoTo Definition. Using this and the forward/back editor keys, it's easy to go up and down a given object hierarchy and see how the inheritance chain is working.

Without this sort of support with just a raw editor, you have to keep all that in your head, which makes it a lot harder, and a lot more reading. The flip side is that you know what file all the functions are in, and I literally never know where anything is stored, because I don't use the flat file hierarchy at all.


JetBrains makes the best IDEs by far. I'd really like to use C-Lion for our development, but haven't yet looked to see if we can make it work with our heavily Microsoft oriented stuff.

If you are interested, their IDE for Python is unbelievably terrific. You are a truly stellar developer, and this IDE for Python would take you to another level for any Python work.

bo3b commented 6 years ago

Just for frame analysis logging it might make some sense, but at that point we already will have 5 permutations of the context object alone:

  • ID3D11DeviceContex
  • HackerContext+ID3D11DeviceContext
  • FrameAnalysisContext+HackerContext+ID3D11DeviceContext
  • HookedContext+HackerContext+ID3D11DeviceContext
  • HookedContext+FrameAnalysisContext+HackerContext+ID3D11DeviceContext

and once we add Context1 variants there will be even more. We kept the design of HookedContext fairly clean - it's functionality is logically separate enough from everything else the context does that it doesn't add significantly to the spaghetti complexity factor, but while the frame analysis logging seems similar at first glance, it is not quite so cleanly separated - e.g. sometimes the logging needs to run before HackerContext (set calls), sometimes it needs to run after HackerContext (get calls), which are easy enough to do with OO, but sometimes it needs to run in the middle of HackerContext (between BeforeDraw and AfterDraw) for the log to make any sense once the Command List actions are added in, which we can't easily do if it is separate from HackerContext. Maybe it could go between HackerContext and ID3D11DeviceContext, but now we are designing OO inheritance to describe sequential logic, which is how we get to OO spaghetti code.

Actually the ID3D11DeviceContext object is not a true C++ object, because of that tweaky Microsoft QueryInterface stuff. This is why it's in a has-a relationship to HackerContext.

I really wanted to actually directly descend from that object, but it doesn't work. We can't use inheritance properly, and because of that can't call out to the super class properly. So that's why we use the has-a pointer to the real game version ID3D11DeviceContext. Disappointing, but I could not get it to work.

We are presently using the ID3D11DeviceContext inheritance as a way to just allow us to return our HackerContext to the game, and hopefully have it call the corresponding methods. This seems to mostly work, but maybe this is why it doesn't work in MGS.

I did not try to return our HackerContext object while not being a descendant of ID3D11DeviceContext, but I'll try that when looking into restucturing. That would clarify the ownership rules, and make it more clear that the ID3D11DeviceContext is a fake object.


For the example of midstream variance like needing to do work part way through the super class, the usual way to handle this would be to restructure the parent (HackerContext) to split up that work into more clearly defined methods. So that FrameAnalysisContext could then call out as needed, and the HackerContext would just call those primary methods instead of inlining the work.

Depending upon the scenario, it might make more sense to simply override the function and do all the work in the subclass FrameAnalysisContext method, and not call the super class.

I think that spaghetti code comes from not improving code structure and architecture, and just doing the one-off "I need just this" type of programming. If it's always one-offs with out ever revisiting architecture, it doesn't matter whether it's structured or objects it will spaghettify.

Please note that I really appreciate that you spend time revisiting some of our architectures and do restructuring work. A lot of developers only do the 'fun' parts of adding features, and skip the tedious and unglamorous stuff.


For the Hooked variants of these objects, I'm not really sure they fit into the OO model at all. Wrapping an object is conceptually solid, if a bit boilerplate and verbose. Hooking would typically have one-off subroutines that are overridden and I don't have a good mental model of how that would tie back to an object framework.

It's almost like a small object variant subclass, where a given method is specifically overridden, and everything else is left in the superclass. I know our variant does all routines, but it seems like we could do just the subsets we care about.

That line of thought would suggest that we do have some five levels of objects, but it would be a straight hierarchy not a branched/mixin/multiple inheritance. Could possibly be done with a virtual interface definition, but I try to always avoid multiple inheritance as it's not worth the complexity and debugging pain, and there is almost always a different choice that will work.

In this case, the hooking is fairly logically a different inheritance, but still for root class methods, so I'll have to think about this some more.

HackerContext HackerContext < FrameAnalysisContext HackerContext < HookedContext HackerContext < FrameAnalysisContext < HookedFrameAnalysisContext


This does suggest that maybe we should just be using the HookedContext as our root object instead. Is there any reason why we don't enable hooking always? We already know wrapping is inferior in at least a few cases. What was the MGS problem that lead to hooking?

I'd be hesitant of that map lookup for every call, but if there is no downside, hooking would be more compatible.

Don't know, just thinking out loud. Our hands are tied in a lot of respects because of how games work, but I feel like we are missing something that would be a more solid architecture.

DarkStarSword commented 6 years ago

Without this sort of support with just a raw editor, you have to keep all that in your head, which makes it a lot harder, and a lot more reading. The flip side is that you know what file all the functions are in, and I literally never know where anything is stored, because I don't use the flat file hierarchy at all.

IDEs were pretty much shunned for Linux kernel development - if you couldn't keep the structure of the code in your head without relying on one you were considered a novice, and it was pretty much routine for code reviewers to point out telltale signs indicating that someone had used an IDE. Tools like ctags and cscope were acceptable to locate and navigate, but the emphasis was very much that the code itself should be understandable, and the coding style is there to help with that. The flip side of not using an IDE is that I usually have a half a dozen code windows opened and tiled in tmux (when on Windows) or i3 (when on Linux) so I'm actually looking at a lot of the code at the same time - so, while I have never adhered to the strict 80 column limit that some devs do, I place great value in not having insanely long lines because it makes the code readable without having to constantly scroll and allows for more terminal windows on the screen at the same time.

Also, I couldn't think of the proper name for the OO spaghetti code problem yesterday - it's the Yo-Yo anti-pattern: https://en.wikipedia.org/wiki/Yo-yo_problem

Actually the ID3D11DeviceContext object is not a true C++ object, because of that tweaky Microsoft QueryInterface stuff. This is why it's in a has-a relationship to HackerContext.

I'm aware of that, but it makes no difference to me whether it is language provided OO or not. You can do OO in C - it's just harder (and usually done terribly - last time I looked at the OO that GTK provided it was really horrible and required all sorts of machine generated junk to work, but as usual the Linux Kernel developers managed to do a good job of OO in C, though coming from a language that supports native OO the implementation in Linux seems up side down and backwards at first with some strange macros to go along with it - but it turns out to have really good reasons to be like that).

I did not try to return our HackerContext object while not being a descendant of ID3D11DeviceContext, but I'll try that when looking into restucturing. That would clarify the ownership rules, and make it more clear that the ID3D11DeviceContext is a fake object.

It's not a fake object at all (well, if you only consider C++ objects as real, then sure - by that interpretation it is fake) - it's just one possible interface that a larger underlying object implements, and QueryInterface is used to find other interfaces that the same object implements (the reason games can go from IDXGIDevice to ID3D11Device and back is because they are both interfaces to the same object). If you are going to implement that with inheritance instead of COM you do it by defining the ID3D11Device interface as one class that has no implementation, IDXGIDevice as another with no implementation, then you create one object that inherits from both of those and implements all of their methods. Composition can also achieve the same thing, but without the nasty inheritance hierarchy. The newer versions of the interfaces can inherit from the old versions, but the key is that they still don't actually implement anything themselves, they only provide an API contract that some larger object implements. Either way that's OO, it's just a question of whether you used the native C++ language's OO constructs, or a different set of constructs to achieve it.

In some ways, we might be better off doing the same - having one larger object for all related interfaces that supports all of those interfaces - ID3D11Device, IDXGIDevice, and all the 1, 2, etc versions of it would be supported by one class, all the Context interfaces would be supported by another.

The thing to avoid is doing too much inheritance - it's too easy to misuse, too easy to wind up with inheritance used to add functionality to a class that would have been better achieved with a composition, etc. We had a prime example of exactly this in 3DMigoto some time back - it's been a while now and I can't recall all the details but both of us implemented different features in our key input infrastructure using inheritance and we ended up with multiple inheritance and there might have even been a diamond relationship (or maybe we were heading in that direction) - I pulled the inheritance out of both of those and changed it to use composition which is a much cleaner design.

Depending upon the scenario, it might make more sense to simply override the function and do all the work in the subclass FrameAnalysisContext method, and not call the super class.

Not unless it makes some logical sense to do so beyond "well, this works". If FrameAnalysisContext had some reason to override the behaviour of HackerContext, maybe - but it doesn't and it shouldn't, and the last thing I want is to be hacking on HackerContext and wondering why my changes aren't having any effect - because the real work is being done elsewhere and has been hidden. That's an anti-pattern - just no, don't do that. Plus there's the related issue that duplicate code tends to leads to two bugs but only one getting fixed.

Please note that I really appreciate that you spend time revisiting some of our architectures and do restructuring work. A lot of developers only do the 'fun' parts of adding features, and skip the tedious and unglamorous stuff.

It's the most important part - no point adding all those features if you end up with a steaming pile of **** ;-)

It's almost like a small object variant subclass, where a given method is specifically overridden, and everything else is left in the superclass. I know our variant does all routines, but it seems like we could do just the subsets we care about.

Well, I specifically did that we that we could pretty much ignore HookedContext once it was working - we wouldn't get into a situation where we add a new feature that depends on say ClearRenderTarget (oh what would you know I just added one) and find out later that it doesn't work when using hooking because we neglected to hook that method. The exceptions we've been forced to have because of crashes and the problems they actively cause for frame analysis are prime examples of this.

This does suggest that maybe we should just be using the HookedContext as our root object instead. Is there any reason why we don't enable hooking always? We already know wrapping is inferior in at least a few cases. What was the MGS problem that lead to hooking?

I'd be hesitant of that map lookup for every call, but if there is no downside, hooking would be more compatible.

We definitely can't do hooking all the time - not until we can explain and resolve the crashes hooking when specific arguments are passed to XXSetShaderResources, XXSetSamplers, RSSetRasterizerState on various OS and patch levels because it makes the frame analysis log completely useless to trace a broken textures to the shader that rendered it (and I have wasted time trying and got very confused before I realised I had hooking enabled).

I was trying to debug them using Witcher 3 since that let the debugger in, but that game had a very inconsistent crash that was pretty random and hard to catch, and when I was able to catch it I found it was just off in la-la land executing random memory, but at least I did manage to implicate the XXSetSamplers hook in that one. In MGSV I was able to identify that it was varying combinations of arguments (e.g. XXSetShaderResources with NumViews=0), OS and a patch levels that would trigger it, but unlike Witcher 3 it seemed to trigger reliably when those combinations were hit - but it manifests as entirely different hooks crashing on different OS/patch levels, so we literally have to test Win 7 with and without evil and Win 10 with and without the anniversary update to see all known crashes (and IIRC there might have been reports of crashes on Win8/8.1 as well, but I don't have either of those), and it scares me that until we resolve that we are vulnerable to future OS updates triggering a new crash.

And I know that we are becoming more reliant on hooking with DXGI and the hardware mouse cursor suppression and it seems reliable for the cases where it works, but I still don't really trust a library that relies on binary patching running program code unless Linus himself wrote it, and I have been taking the approach of only installing hooks when the features that depend on them are enabled. It might be worthwhile trying an updated version of the hooking library at some stage to see if that resolves the crashes (speaking of which, it looks like Nektra finally got around to renewing their domain name).

DarkStarSword commented 6 years ago

What was the MGS problem that lead to hooking?

Now that's actually quite interesting - they were checking that the pointers in the vtable were in the expected library. The only ones we could override was from IUnknown and ID3D11DeviceChild - changing any other pointer in that table would trigger the game's anti-modder code and kill it. I know this because I very deliberately returned the original vtable from DirectX only changing exactly what I wanted to. The changed pointers never got called - the game noticed them and killed itself first. Hooking works because it patches the machine code of the function that is called - so the pointers still point to the d3d11.dll the game expects them to be in, but since the system d3d11.dll could change they can't scan it for untrusted code and hence we can get in that way.

DarkStarSword commented 6 years ago

6 - I can't find anywhere in the DX11 project where we still check for NULL pointers after new

bo3b commented 6 years ago

6 - I can't find anywhere in the DX11 project where we still check for NULL pointers after new

Thanks for checking. Only real question then is whether we want to change our exception policy to try to handle C++ instantiation exceptions. Right now we'll just crash with no notice. But we also don't actually ever expect to see these.

DarkStarSword commented 6 years ago

2: Far Cry 4, Witcher 3 and Arkham Knight are now in my 3d-fixes repo, and remembering how annoying filter-branch is to use on Windows (because of Window's filesystem locking policies) I've pushed up a branch containing only the history for the remaining games to a separate branch. I didn't include the history of any example shaders since those are staying in the 3DMigoto repository under TestShaders.

https://github.com/bo3b/3Dmigoto/tree/game-fixes

Feel free to set up a new repository with that as master, then we can delete them from this repository.

bo3b commented 6 years ago

2: Far Cry 4, Witcher 3 and Arkham Knight are now in my 3d-fixes repo, and remembering how annoying filter-branch is to use on Windows (because of Window's filesystem locking policies) I've pushed up a branch containing only the history for the remaining games to a separate branch. I didn't include the history of any example shaders since those are staying in the 3DMigoto repository under TestShaders.

https://github.com/bo3b/3Dmigoto/tree/game-fixes

Feel free to set up a new repository with that as master, then we can delete them from this repository.

Sounds good. I won't be able to get to this for about a week, but created the bo3b/3d-fixes repo and added you as collaborator in case you wanted it sooner.

DarkStarSword commented 6 years ago

Sounds good. I won't be able to get to this for about a week, but created the bo3b/3d-fixes repo and added you as collaborator in case you wanted it sooner.

Done - nice to get these out of the repo at long last :)

bo3b commented 6 years ago

Sounds good. I won't be able to get to this for about a week, but created the bo3b/3d-fixes repo and added you as collaborator in case you wanted it sooner.

Done - nice to get these out of the repo at long last :)

Thank you kindly for doing that, very nice.

bo3b commented 6 years ago

I was trying to debug them using Witcher 3 since that let the debugger in, but that game had a very inconsistent crash that was pretty random and hard to catch, and when I was able to catch it I found it was just off in la-la land executing random memory, but at least I did manage to implicate the XXSetSamplers hook in that one. In MGSV I was able to identify that it was varying combinations of arguments (e.g. XXSetShaderResources with NumViews=0), OS and a patch levels that would trigger it, but unlike Witcher 3 it seemed to trigger reliably when those combinations were hit - but it manifests as entirely different hooks crashing on different OS/patch levels, so we literally have to test Win 7 with and without evil and Win 10 with and without the anniversary update to see all known crashes (and IIRC there might have been reports of crashes on Win8/8.1 as well, but I don't have either of those), and it scares me that until we resolve that we are vulnerable to future OS updates triggering a new crash.

Not sure of course, but this behavior reminds me very much of the weird problem I stumbled across in the DX9Ex interfaces, using the similar C interface approach. The problem manifested as random crashes, with stack misalignment. With debug layer enabled, I'd get bad parameters, and reports of stack inconsistencies.

After much pain- the answer was that the header file from DX9 is actually wrong. They are missing a method in the C interface, and so the vtable is off by one from that point down. Naturally, that causes a lot of problems, but it would only crash on my one specific DX9Ex call. :->

Might be unrelated, but this behavior sounds suspiciously similar, and could be a bad entry in the C tables for DX11, or other vtable shenanigan.

As a general idea, the reason I chose Nektra in-proc over other packages I studied was because I'm pretty confident they have solid code and solid architecture. It's harder to use and not well documented, but I think it's as solid as Detours.

DarkStarSword commented 6 years ago

OK, the changes to frame analysis logging are officially causing me problems, hiding the hashes of shaders replaced via F10 reload in the log because it now sees the changed shader from 3DMigoto, not the address passed from the game that it is expecting:

https://github.com/bo3b/3Dmigoto/issues/86#issuecomment-356921455

I'm tempted to revert these changes given they only led to < 1fps performance improvement, but a bigger concern at this point is that doing so will be pretty much guaranteed to cause a merge conflict with the exp_single_layer branch, which will be a major headache to resolve.

Looking forward, I think it is clear that the frame analysis logging needs to happen before HackerContext to resolve this. By itself that would cause some other issues in the frame analysis log, where the command list logging will be out of order compared to the call from the game, but let's just deal with it and play with the logging a little to find a way to make it acceptable - for example, the command list logging could log when it starts and/or ends executing a pre or post command list, so that it is clear that the original call would go between them.

bo3b commented 6 years ago

Is that <1fps in a known CPU bottleneck test? I would expect that change to have an outsized influence in a CPU bottleneck, and is part of my earlier motivation to drive our CPU usage as low as possible.

If it's a problem, I can do that merge without too much trouble. The graphical merge tool in VS is quite good, and allows me to click through changes easily. I could fix this complex merge in HackerContext in less than an hour I think. Worst case I can redo it without trouble, 3rd time is much, much easier and faster. Even the changes to FrameAnalysis.cpp are all find/replace.

Judgement call, I'm not sure what is best here. Not a huge problem to merge I think. I would lean toward what is best for long term use.


The value add for exp_single_layer has jumped up. I successfully got the Overlay to draw in Dishonored2, and load the game without a crash. Logging was working, but hunting was broken because of the use of Globals. The active swapchain was not connected to the active device so the hunting list was empty. Overlay is first use, others are last use. Dishonored2 creates Devices 8 times. I'll figure out something.

Given the success with Batman and Dishonored2, this is looking pretty promising.

DarkStarSword commented 6 years ago

Is that <1fps in a known CPU bottleneck test? I would expect that change to have an outsized influence in a CPU bottleneck, and is part of my earlier motivation to drive our CPU usage as low as possible.

Yeah, that <=1fps was the amount @masterotaku reported it improved performance in Grim Dawn when CPU bound (51fps before, 52fps after), which was consistent with my prediction based on the performance improvement he reported from other optimisations (1% CPU use on my system equated to about 2fps on his system, and adding up all frame analysis logging calls was totalling about 0.45% CPU use on my system).

There's still some performance difference to look at - masterotaku reports we are still about 6fps off vanilla performance in that game, but the only other significant blip showing up in my profiling was the software mouse, which he reported made no measurable difference to framerate when disabled. Anything else was pretty small, and definitely in the realm of diminishing returns - unless we can find a more major bottleneck that didn't show up in my profiling (we may need the DLC and his save) I'm not sure how much else we can do there.

If it's a problem, I can do that merge without too much trouble. The graphical merge tool in VS is quite good, and allows me to click through changes easily.

It was a fairly big change (expand the diffs), and the kind of thing that is long enough that it can be easy to zone out and miss something when merging: https://github.com/bo3b/3Dmigoto/commit/1c3fcab97cdfefb839c5e6b1af0bf2606e6f3119

I'll give it some thought - swapping the order of HackerContext and FrameAnalysisContext around is doable, but aside from the changes to the logging (which I think we can deal with), it would also remove our ability to enable frame analysis on the fly if the game was launched without hunting enabled, which is not ideal since it is nice to be able to use that if we come across something we need to examine and aren't in a position to restart the game... but on the other hand, I almost always run with hunting=2 anyway, and we already accept that a lot of 3DMigoto functionality needs hunting enabled at launch to work, so maybe that's a limitation we can just live with.

The value add for exp_single_layer has jumped up. I successfully got the Overlay to draw in Dishonored2, and load the game without a crash.

Sweet :)

hunting was broken because of the use of Globals. The active swapchain was not connected to the active device so the hunting list was empty. Overlay is first use, others are last use. Dishonored2 creates Devices 8 times. I'll figure out something.

Interesting that we hit that so quickly with this change - I expected it to be rarer that we would hit sometime down the track.

Given the success with Batman and Dishonored2, this is looking pretty promising.

Definitely :)

DarkStarSword commented 6 years ago

One comment on profiling - I think you've mentioned you focussed on the exclusive samples column? It's worth keeping in mind that the inclusive samples column can also be indicative of performance bottlenecks in 3DMigoto - ideally we would exclude only those DirectX calls that we pass through from the game, but include any that we have added, but there isn't a column for that so it needs a bit of a closer examination.

e.g. HackerContext::DrawIndexed() only showed 0.12% exclusive samples in my profiling, but showed 5.25% inclusive - however the original DirectX call only measured 0.4% CPU, so we were actually adding 4.85% CPU from that call, not 0.12% (even totalling up the exclusive samples of things like BeforeDraw, AfterDraw, etc. didn't add up to much). After optimising it was down to 0.11% exclusive / 0.66% inclusive - a reduction of 4.6% CPU, not just 0.01%.

bo3b commented 6 years ago

e.g. HackerContext::DrawIndexed() only showed 0.12% exclusive samples in my profiling, but showed 5.25% inclusive - however the original DirectX call only measured 0.4% CPU, so we were actually adding 4.85% CPU from that call, not 0.12% (even totalling up the exclusive samples of things like BeforeDraw, AfterDraw, etc. didn't add up to much). After optimising it was down to 0.11% exclusive / 0.66% inclusive - a reduction of 4.6% CPU, not just 0.01%.

This might just be an artifact of how we respectively use the tool, but that doesn't add up to me. If I'm looking at Exclusive, I don't want any mystery missing bits, I want just the stuff that our dll does. I sort by our dll, and use exclusive. If Inclusive is showing 10x more, but doesn't properly account for it, that's not good. Is the difference in terms of our raw code, versus functions? Either way, I can't have the tool lying to me.

Here's an older performance analysis: #44

Edit: In this example, the DrawIndexed is showing that 4.62% is used in Exclusive, 0.01% in Exclusive. The difference is taken up by the nvidia driver (nvwgf2umx.dll). Not sure, maybe I'm confused.

bo3b commented 6 years ago

It was a fairly big change (expand the diffs), and the kind of thing that is long enough that it can be easy to zone out and miss something when merging: 1c3fcab

I did a preflight of this revert on exp_single_layer, and there are 85 conflicts. Everything else is a clean merge. The conflicts revolve around my renaming mOrigContext to mOrigContext1, so a good approach would be to find/replace/grep those back, do the revert, then find/replace/grep back.


Also, based on your approach there, we took a different tack for naming the objects. I went with mOrigContext1 storing an ID3D11DeviceContext1. You kept mOrigContext and mPassThroughContext, and they might contain ID3D11DeviceContext1 or ID3D11DeviceContext depending upon d3dx.ini.

Just checking to see if you think naming mOrigContext1 is helpful or troublesome. I chose that naming because in exp_single_layer we only expect one layer, and it is either ID3D11DeviceContext1 or ID3D11DeviceContext, and did not want to have two variables. It didn't feel right to name it mOrigContext when the composition object would be ID3D11DeviceContext1, and it's a superclass of ID3D11DeviceContext, which is why I don't mind (much) the no platform_update scenario where it is misleading.

I don't have a strong preference here, keeping the name mOrigContext with the understanding that it references a ID3D11DeviceContext1 doesn't seem that out of place. I went back and forth on this a couple of times already, so clearly I'm waffling. :->

DarkStarSword commented 6 years ago

This might just be an artifact of how we respectively use the tool, but that doesn't add up to me. If I'm looking at Exclusive, I don't want any mystery missing bits, I want just the stuff that our dll does. I sort by our dll, and use exclusive. If Inclusive is showing 10x more, but doesn't properly account for it, that's not good. Is the difference in terms of our raw code, versus functions? Either way, I can't have the tool lying to me.

The problem is you can't just count the samples recorded in our DLL - you have to also consider the samples in the original DirectX DLL (and nvapi and whatever else) if they are calls that we added, not just calls that we passed through from the game. One example from my optimisations were removing the SetShaderResources calls to bind StereoParams and IniParms from BeforeDraw() - these barely showed up in exclusive samples at all because most of the work happened in the original DirectX DLL, not in 3DMigoto, but 3DMigoto was responsible for calling them, so we still need to count their overhead as our overhead. But, we shouldn't count the CPU usage from the DrawIndexed() call in the original DirectX DLL, because the game called that, not us - we just passed it through.

I guess you can think of it as the exclusive samples giving a minimum bound for our CPU overhead, and inclusive giving a maximum bound - the actual value would be somewhere between the two. If exclusive is high we definitely need to check that out because that's definitely us, but we also need to take a quick peek at anything with a high inclusive to check if most of it is coming from the original DirectX call we are passing through, or something we added.

Edit: In this example, the DrawIndexed is showing that 4.62% is used in Exclusive, 0.01% in Exclusive. The difference is taken up by the nvidia driver (nvwgf2umx.dll). Not sure, maybe I'm confused.

Most of the overhead we've added to the DrawIndexed() call will be in BeforeDraw(), which is showing 0.89% CPU inclusive / 0.18% exclusive. Since we know that everything inside that function is new functionality we've added we can completely disregard the exclusive samples for it and focus totally on inclusive samples, because even if it's in another function or DLL, it is going to be overhead that we've added.

But of course we aren't responsible for the whole 4.62% of the inclusive samples of the whole DrawIndexed() call - to look at that one we need to take that number and subtract however many occurred within the original DirectX DrawIndexed() call - whatever remains is overhead that we have added, regardless of whether it occurred within our DLL, DirectX, nvapi or some other system DLL.

Also, based on your approach there, we took a different tack for naming the objects. I went with mOrigContext1 storing an ID3D11DeviceContext1. You kept mOrigContext and mPassThroughContext, and they might contain ID3D11DeviceContext1 or ID3D11DeviceContext depending upon d3dx.ini.

I mostly kept the naming from the original HackerContext.cpp so that diffing the two files would give a somewhat readable diff. I'm quite happy for this to be renamed to mOrigContext1 if that's what we end up going with in HackerContext.

I was sort of thinking it might be useful if we put "ID3D11DeviceContext mOrigContext" and "ID3D11DeviceContext1 mOrigContext1" in a union together so either name will work and refer to the same object. That way, whenever we refer to it as "mOrigContext" the compiler would prevent us using any Context1 functions by accident forcing us to explicitly acknowledge it when we use one, and the union means we avoid any verbose type casting that we would get if we only stored it as a regular context. Best of both worlds type of thing - but I know not everyone likes unions, so I don't really mind what we end up going with.

DarkStarSword commented 6 years ago

I guess one thing to consider, is that while our object will expose all the Context1 functions, the object we are wrapping may or may not - so in that regard maybe we shouldn't call it mOrigContext1, because it may not be a Context1... I dunno - this decision is the absolute definition of bike shedding, and ultimately it's not going to matter too much what we go with.

bo3b commented 6 years ago

I guess one thing to consider, is that while our object will expose all the Context1 functions, the object we are wrapping may or may not - so in that regard maybe we shouldn't call it mOrigContext1, because it may not be a Context1... I dunno - this decision is the absolute definition of bike shedding, and ultimately it's not going to matter too much what we go with.

But really, what color should the roof be? I hadn't heard of bike shedding before your earlier post, and looked it up.

For this one then, I'll leave it as Context1, because our goal with this branch (and future, assuming it works) is to always upcast to the higher object if we can, no matter what the game requests. That's how the code runs right now. So, since that's a main goal, it was the deciding factor for me.

Edit: Let me think about that union idea. I'm not necessarily opposed, it just seems like extra complexity with low payoff.

bo3b commented 6 years ago

The problem is you can't just count the samples recorded in our DLL - you have to also consider the samples in the original DirectX DLL (and nvapi and whatever else) if they are calls that we added, not just calls that we passed through from the game. One example from my optimisations were removing the SetShaderResources calls to bind StereoParams and IniParms from BeforeDraw() - these barely showed up in exclusive samples at all because most of the work happened in the original DirectX DLL, not in 3DMigoto, but 3DMigoto was responsible for calling them, so we still need to count their overhead as our overhead. But, we shouldn't count the CPU usage from the DrawIndexed() call in the original DirectX DLL, because the game called that, not us - we just passed it through.

Ah. Good example. I understand what you are saying. I'll keep an eye out for it.

As a general rule for the much earlier performance testing, for hot paths we were doing pretty much straight pass through (map lookups, replacements), with no processing to speak of, so it was easier to ignore Inclusive.

DarkStarSword commented 6 years ago

I've gone ahead and reverted the frame analysis changes - they are causing me too much of a headache at the moment to justify leaving them in.

We can either pull them out of the exp_single_layer branch as well with a plan to revive them at a later date, or alternatively we could merge the commit just prior to the reversion into that branch, make the changes that I think should make it work, then merge the reversion commit with "git merge -s ours" to discard the reversion.

Any thoughts?

bo3b commented 6 years ago

Since we are close to shipping the 1.3 branch, I'm going to update our DirectXTK to the latest version there, but switch it to a submodule. Now seems like the time for 1.3 since there are so many other big changes.

I don't think this is worth the effort for 1.2 branch since it is close to archival, and would rock the boat for low payoff.

Edit: After studying this a bit, I don't think we want a submodule for the DirectXTK, because we need to modify the build parameters in the VS projects. So, it's not just a plug-in, and we do not want to push those changes upstream, but do want them in our project.

DarkStarSword commented 6 years ago

Edit: After studying this a bit, I don't think we want a submodule for the DirectXTK, because we need to modify the build parameters in the VS projects. So, it's not just a plug-in, and we do not want to push those changes upstream, but do want them in our project.

Another option is to fork DirectTK on github with whatever changes we need and point the submodule at that fork.

DarkStarSword commented 6 years ago

Just FYI - I'm getting a crash on launch in Tekken 7 with the exp_single_layer branch at the moment. I was trying to test whether the 1.2.71 merge was good or not, but it seems the crash originated from before that point anyway. Debug log attached. Haven't dug into it yet - still sorting out the vs2015 branch, just thought you should know.

d3d11_log.txt

DarkStarSword commented 6 years ago

git bisect got it down this far - due to compile errors in these commits I couldn't track it down further.

There are only 'skip'ped commits left to test. The first bad commit could be any of: eefcb65a8f6ea935204c17892386ffefcaa41a02 f56973882398d6c5a14e71ab70746276b9d7c7f2 8f6debdf2bca2edf366f13c2d1ae7348c9d850b5 We cannot bisect more!

DarkStarSword commented 6 years ago

vs2015 branch sorted, now I can debug. Crash is coming from creating the overlay in HackerSwapChain, called from Hooked_CreateSwapChain

bo3b commented 6 years ago

In that log, the Overlay is not properly created and causes an exception, which is caught. We possibly do not look for overlay being null later, which might be the crash. It looks like it crashes at first Present call, which would call DrawOverlay.

Edit: The d3dx.ini does not match the 1.3 branch version here. Not sure it matters, but it is forcing a downgrade to 11.0 for the game.

[System]
  allow_check_interface=0
  allow_create_device=2
  allow_platform_update=0

should be:
[System]
  allow_check_interface=1
  allow_create_device=1
  allow_platform_update=1
DarkStarSword commented 6 years ago

Looks like simplifying HackerDevice::QueryInterface was the cause - UE4 used that to get IDXGIDevice bypassing us causing this crash. We no longer have an IDXGIDevice wrapper we can use right?

DarkStarSword commented 6 years ago

To be clear - UE4 passed the IDXGIDevice it got from HackerDevice::QueryInterface into CreateSwapChain(), which we attempted to reinterpret_cast to a HackerDevice, which we can't do.

bo3b commented 6 years ago

To be clear - UE4 passed the IDXGIDevice it got from HackerDevice::QueryInterface into CreateSwapChain(), which we attempted to reinterpret_cast to a HackerDevice, which we can't do.

OK. That's not supposed to be legal, according to the docs, an IDXGIDevice != ID3D11Device, but it probably works for the normal calls. Given the slop there, I would just try returning the HackerDevice in QueryInterface instead of the IDXGIDevice. Nobody actually cares about that device, as far as I can tell. Pretty sure someone was going fast or missed these are not the same. The normal path is to use that inbetween IDXGIDevice to fetch the parent factory for CreateSwapChain, but still use a real device in the call.

We no longer have a DXGIDevice and I really think we do not want one. That whole daisy-chain of madness is bad.

Also, I'm fairly certain that I tested a UE4 game earlier. I'll double check to see it's all UE4 or something unique to Tekken.

Edit: no that wont' work, it's needed to fetch the parent.

DarkStarSword commented 6 years ago

Edit: The d3dx.ini does not match the 1.3 branch version here. Not sure it matters, but it is forcing a downgrade to 11.0 for the game.

[System] allow_check_interface=0 allow_create_device=2 allow_platform_update=0

should be: [System] allow_check_interface=1 allow_create_device=1 allow_platform_update=1

No difference :(

OK. That's not supposed to be legal, according to the docs, an IDXGIDevice != ID3D11Device

It's perfectly legal - CreateSwapChain() takes an IUnknown and calls QueryInterface on it, and since both of those are interfaces to the same object it works just fine.

Given the slop there, I would just try returning the HackerDevice in QueryInterface instead of the IDXGIDevice.

I don't like that idea - they may be interfaces to the same object, but they are not compatible interfaces - if we return the wrong one and something calls a function pointer on it - BOOM.

We no longer have a DXGIDevice and I really think we do not want one. That whole daisy-chain of madness is bad.

How about we use a reverse mapping of ID3D11Device (or even IUnknown) to HackerDevice, then in CreateSwapChain() we can use QueryInterface() to get the ID3D11Device to look up the HackerDevice* in the map?

DarkStarSword commented 6 years ago

Yeah, I think I'm going to experiment with writing a FindHackerDeviceFromIUnknown() that tries to take care of all cases - hooked, non-hooked, HackerDevice, ID3D11Device, IDXGIDevice, etc. Try to give us one true way to get our HackerDevice when we've been passed some unknown device.

Might add a GUID for HackerDevice while I'm at it so we can QueryInterface for that.

bo3b commented 6 years ago

OK. That's not supposed to be legal, according to the docs, an IDXGIDevice != ID3D11Device

It's perfectly legal - CreateSwapChain() takes an IUnknown and calls QueryInterface on it, and since both of those are interfaces to the same object it works just fine.

No, it's not legal. https://msdn.microsoft.com/en-us/library/windows/desktop/bb174537(v=vs.85).aspx

For Direct3D 11, and earlier versions of Direct3D, this is a pointer to the Direct3D device for the swap chain.


The fact that works is different from it being legal. It's documented to specifically look for a Direct3D device, and a DXGI device is not the same. It doesn't have any of the same methods, you cannot call CreateVertexShader using an IDXGIDevice. They are not documented as being the same, or that they would come back with the same pointer.

This is either slop that snuck through, or Microsoft making a specific workaround for some reason.

It doesn't really matter, we have to support what the games do, even if they are buggy.


If we were to do a QueryInterface back to ID3D11Device in CreateSwapChain, then your HackerDevice lookup would succeed.

bo3b commented 6 years ago

Please feel free to look at this if you prefer. I'm happy to fix this this, but I can't look at it until tomorrow. BTW, this is the sort of stuff I was expecting after a big refactor and simplification here, it's not a surprise. Pretty sure it's no surprise to you either.

Edit: I'm not in love with the idea of using a GUID here, but that might be a good choice. We are certain that we catch every CreateDevice call, and turn them into HackerDevices. At any random time like CreateSwapChain, we could QueryInterface(HackerDevice) and be certain to get us back.

Assuming that you are always correct that IDXGIDevice and ID3D11Device are the same object. Documentation for implementing COM specifically says you cannot rely on that connection, but in practice it might be true.

Edit2: I can reproduce this on Obduction as well. UE4 of right vintage.

DarkStarSword commented 6 years ago

No, it's not legal. https://msdn.microsoft.com/en-us/library/windows/desktop/bb174537(v=vs.85).aspx

For Direct3D 11, and earlier versions of Direct3D, this is a pointer to the Direct3D device for the swap chain.

That documentation is not asking for an ID3D11Device or ID3D10Device, it is asking for a Direct3D device, and if an IDXGIDevice supports those interfaces than it is one of those.

The fact that works is different from it being legal. It's documented to specifically look for a Direct3D device, and a DXGI device is not the same.

But they are the same - this may be an implementation detail that they don't necessarily want to advertise, but that's how DirectX has worked ever since DXGI was introduced.

It doesn't have any of the same methods, you cannot call CreateVertexShader using an IDXGIDevice.

This is true - they are two separate interfaces and cannot be cast to one another with C style casting, but that doesn't mean that they are separate objects, just separate interfaces. The key difference here between this and say ID3D11Device1 vs ID3D11Device is that the later uses interface inheritance so you can also cast between them, but COM isn't limited to interface inheritance so the fact that neither ID3D11Device or IDXGIDevice inherit from each other doesn't mean that they are separate.

They are not documented as being the same, or that they would come back with the same pointer.

Exactly - if you check you will find that they actually do come back with the same pointer. We've been over this - I have long suspected that IDXGIDevice and ID3D11Device were part of the same larger object, and they are. Remember in COM you need to check the IUnknown* (using QueryInterface) to find out if two interfaces are part of the same object, so here's the proof - this code:

HRESULT STDMETHODCALLTYPE HackerDevice::QueryInterface(
    /* [in] */ REFIID riid,
    /* [iid_is][out] */ _COM_Outptr_ void __RPC_FAR *__RPC_FAR *ppvObject)
{
    LogDebug("HackerDevice::QueryInterface(%s@%p) called with IID: %s\n", type_name(this), this, NameFromIID(riid).c_str());

    HRESULT hr = mOrigDevice1->QueryInterface(riid, ppvObject);
    if (FAILED(hr))
    {
        LogInfo("  failed result = %x for %p\n", hr, ppvObject);
        return hr;
    }

    {
        IUnknown *test;

        if (SUCCEEDED(mOrigDevice1->QueryInterface(IID_IUnknown, (void**)&test))) {
            LogInfo("0x%p->QueryInterface(IID_IUnknown): 0x%p\n", mOrigDevice1, test);
            test->Release();
        }

        if (SUCCEEDED((*(IUnknown**)ppvObject)->QueryInterface(IID_IUnknown, (void**)&test))) {
            LogInfo("0x%p->QueryInterface(IID_IUnknown): 0x%p\n", *ppvObject, test);
            test->Release();
        }
    }

Produces this output:

HackerDevice::QueryInterface(class HackerDevice@00000000180D9300) called with IID: IDXGIDevice
0x00000000187150D0->QueryInterface(IID_IUnknown): 0x0000000018715070
0x00000000187150C8->QueryInterface(IID_IUnknown): 0x0000000018715070
  returns result = 0 for 00000000187150C8

The IUnknown pointers match, so this proves that they are the same object. They are two different interfaces and they can't be cast to each other with C style casting or reinterpret_cast or whatever - only QueryInterface can traverse between the two, but they are the same object. It's also worth noting that the ID3D11Device and IDXGIDevice are only 8 bytes apart, and in actual fact they are at offsets 88 and 96 in some other larger object that we don't know about beyond it's IUnknown* (that would be the real DirectX Device implementation that is never exposed to us mere applications).

It doesn't really matter, we have to support what the games do, even if they are buggy.

I don't actually think this is buggy - unusual, yes, but not buggy. And yeah, we have to support it.

If we were to do a QueryInterface back to ID3D11Device in CreateSwapChain, then your HackerDevice lookup would succeed.

I'm thinking we should try reducing everything to IUnknown pointers - I feel like that's going to be the most likely to be future proof. I'll try it out and see how it goes.

Please feel free to look at this if you prefer.

Sure thing :)

BTW, this is the sort of stuff I was expecting after a big refactor and simplification here, it's not a surprise. Pretty sure it's no surprise to you either.

Yep, absolutely expected some things like this to crop up :)

DarkStarSword commented 6 years ago

There is actually a deeper issue here - if the game has a way to get to the real IDXGIDevice, then it can use that to get to the real ID3D11Device and effectively unhook us. We might need to think about the possibility of bringing back our IDXGIDevice wrapper (if we do, I'd rather bring it back as part of HackerDevice or a larger composite device and try to replicate the reality of the two objects we are wrapping - essentially I want to wrap the IUnknown, not the ID3D11Device or IDXGIDevice), or maybe we just hook QueryInterface so we can return our HackerDevice when something asks for an ID3D11Device* - but that might be a problem if DirectX itself calls that.

For now, let's see if we can avoid bringing that back - even if we go with a simpler design it is added complexity that would be better to avoid.

I've been thinking this for a while, but what would happen if we change our QueryInterface approach to a whitelist instead of a blacklist - really we need to be able to wrap anything and everything that call can return, since anything that can be returned can be used to get back to an ID3D11Device, so in theory our best strategy would be to deny any requests for an interface we don't support. But that might not be practical because maybe some of those might be a lot of work for something we don't actually care about, or even undocumented interfaces we have no way to implement, and if a game/overlay/driver/whatever treats one of those as a hard failure we'll be in trouble - we'd essentially be relying on there being a fallback path, that probably doesn't exist in a lot of cases. Probably not something we can do if we hook QueryInterface, because then DirectX itself would call into our blacklist, and we definitely don't know anything about any of their internal interfaces. Might be worth an experiment anyway.

DarkStarSword commented 6 years ago

The IUnknown pointers match, so this proves that they are the same object.

Out of curiosity I went back and tested this on Win 7 with and without the evil update to check if this detail has ever changed, and can confirm that it was the same back then as well - the IUnknown pointers match for ID3D11Device and IDXGIDevice. That gives me more confidence that using the IUnknown pointers is the right way to relate these two objects.

Also confirmed that this issue exists on Submerged, which is UE4.8, Tekken 7 is UE4.13 (and confirmed that this is how UE4 has always done this since Epic released the source code).

DarkStarSword commented 6 years ago

I've pushed up a solution for this mapping IUnknowns to HackerDevices. Tekken 7 now launches, as do other UE4 games, and verified that this also works with hooking on MGSV.

bo3b commented 6 years ago

I think you are letting your enthusiasm for recognizing that IDXGIDevice returns the same pointer as ID3D11Device overrule your normal tendencies. You've spoken many times about not counting on implementation details. And this most certainly is an implementation detail.

The documentation never, ever says the two objects are the same, that the pointers might be the same, or that they can be treated the same. They specifically discourage that in fact, as part of their black-box approach.


You skipped the documentation I added to that QueryInterface routine above: _For any one object, a specific query for the IUnknown interface on any of the object's interfaces must always return the same pointer value. This enables a client to determine whether two pointers point to the same component by calling QueryInterface with IIDIUnknown and comparing the results. It is specifically not the case that queries for interfaces other than IUnknown (even the same interface through the same pointer) must return the same pointer value.

If a caller asks specifically for QueryInterface(IDXGIDevice) then that documentation right there says that they cannot rely upon it being the same pointer. By the strict reading, it is not legal to pass this to CreateSwapChain.


For a Direct3D Device, it most definitely is not an IDXGIDevice as defined in the documentation:

https://msdn.microsoft.com/en-us/library/windows/desktop/bb174527(v=vs.85).aspx

The IDXGIDevice interface is designed for use by DXGI objects that need access to other DXGI objects. This interface is useful to applications that do not use Direct3D to communicate with DXGI.

The Direct3D create device functions return a Direct3D device object. This Direct3D device object implements the IUnknown interface. You can query this Direct3D device object for the device's corresponding IDXGIDevice interface. To retrieve the IDXGIDevice interface of a Direct3D device, use the following code:


If these are in fact the same object, and we can just slop around and use one when we mean to use the other, like in the call to CreateSwapChain... Then surely it also OK to simply call Release on whichever one we happen to be holding. i.e. It doesn't matter if it's IDXGIDevice->Release, or ID3D11Device->Release.

It's the same object, right? Are you willing to bet the ranch on that behavior?

https://msdn.microsoft.com/en-us/library/windows/desktop/ms692481(v=vs.85).aspx

From a COM client's perspective, reference counting is always done for each interface. Clients should never assume that an object uses the same counter for all interfaces.

By them passing in the wrong object to CreateSwapChain, the ref counts are not correct.


You and I know that it works, but that is not what the documentation says. Whether we want to rely upon undocumented behavior is a different question.

However, I think your approach for using IUnknown for the pointer is correct, and does not break any rules I've read. Any call to IUnknown must always give the same result. Using the ID3D11Device pointer would possibly work, but would not match the documentation.

bo3b commented 6 years ago

There is actually a deeper issue here - if the game has a way to get to the real IDXGIDevice, then it can use that to get to the real ID3D11Device and effectively unhook us. We might need to think about the possibility of bringing back our IDXGIDevice wrapper (if we do, I'd rather bring it back as part of HackerDevice or a larger composite device and try to replicate the reality of the two objects we are wrapping - essentially I want to wrap the IUnknown, not the ID3D11Device or IDXGIDevice), or maybe we just hook QueryInterface so we can return our HackerDevice when something asks for an ID3D11Device* - but that might be a problem if DirectX itself calls that.

No, this cannot happen. There is no legal way for them to fetch a IDXGIDevice without using an existing ID3D11Device. There is no documentation showing any way to create them directly.

Someone might have hacked something up that we'll have to deal with, like some Factory->QueryInterface or something, but there are no extant examples in my searches.


One major point of doing this refactoring was simplification. If we are just going to junk it all up again with all the myriad ways it might fail, then there was no point. I really, really do not want to add more complexity.