Closed RomanHargrave closed 8 years ago
Is there no way to jmp to something in between that is reliable to work around the issue? Do you have a way to locate an address by doing an Array of Bytes scan?
Locating the address is not the problem. The problem we are running in to is the way that the kernel arranges libraries in memory.
When you look at the memory maps for PAYDAY 2, you will notice that the process image resides at a far lower space in virtual memory than any other libraries. The "distance", if you can call it that, between PAYDAY 2 and the libraries that it statically links, which make up the image, and our library along with others like the steam overlay and API is larger than the maximum value that opcode E9 will accept as a relative jump offset.
This is done intentionally, but for other reasons. It is just not particularly beneficial to our rather specific scenario.
I take it reducing the amount of memory available for the PD2 process and any libs it links statically is probably not an option? Something like a docker container to isolate the processes from the rest of the system. This isn't my forte, wish I could help in some way though.
I finally found some time to try to help with the hooking situation. I've made a temporary fork of subhook which pushes the address onto the stack and returns rather than uses a jump: https://github.com/Ozymandias117/subhook
See if this works any better for you.
@Ozymandias117 Hm, this is interesting.
I've tried yours with this branch: https://github.com/LeonardKoenig/blt4l/tree/PRQ_clenaup_and_fixes
And I at least can see that our hooked function is getting called (instead of the original) -- which is definitely an improvement, thanks!
Sadly it still hangs (sometimes) when trying to execute std::__1::recursive_mutex::lock
:/
@Ozymandias117 has effectively solved the core issue for this thread, which I appreciate, because I really don't like to deal with Intel-family CISC assembly, so I'm going to close the issue for now.
I had researched different methods for function intercepts previously, and this was certainly one that was suggested, though I had trouble researching it due to the rather general nature of the terminology involved. (Just try searching for information about direct and indirect jumps and the problems introduced by the 64-bit extensions. I dare you.)
That tangent aside, I think that we should certainly continue to investigate the SubHook option, albeit with @Ozymandias117 fork; additionally, would go so far as to suggest you open a PR to subhook with that modification if you don't mind putting the time in to backing up your changes on the PR page.
Now, with regards to the changes and their affect on process longevity.
From my limited understanding of the exact behaviour implied by RET, we shouldn't be contaminating the stack any more than we would be with the JMP behaviour; and, if the material that I just read (in haste) would suggest anything on the matter, it would be that this is the more "correct" way to do something this far south of "correct".
Yes, the stack should be clean (as can be seen by examining %rsp in gdb).
I can look into a PR, but I'll still need to clean up the Trampoline case first. Currently my fork can create "correct" trampolines that will throw you off into invalid memory still. This is why I'd mentioned in the other thread that it should only be used with ScopedRemove (or c-style remove/install).
Using this though, I've been able to hook into do_game_update and play a full mission without any crashes (Just printing "Enter game update" "Leaving game update" on either side of the original call). I think a lot of the effort now is going to be figuring out how to attach to the current BLT infrastructure.
That's good. I'm going to toy with your update as the submodule inplace of the standard upstream.
We do not, and probably will not in the foreseeable future, use trampolines, and if so, sparingly as very few functions are even "twiddled" with, if you will.
I will also have to investigate @LeonardKoenig 's claim about thread mutexes, because I really want to steer clear of anything that would involve both C++ and threads (moreso the C++ part. at least threads make sense and do what you tell them to do).
Also, if by "BLT" infrastructure you mean mod compatibility, we should have 100% compatibility with a given mod provided that there are no platform "quirks" and that the mod does not use drive letters. I can always modify the runtime to translate windows path separators to real path separators, but I would like to avoid it, because that's really LUA's job as a cross-platform (*) language, and not ours (though the modder should have known better and used real path separators).
Yeah, I was trying to reproduce @LeonardKoenig's issue, but I wasn't able to get this hook to run correctly. So I just added the do_game_update hook to the small main hook I'd used to make the changes to SubHook. If I can get a build working that has that issue, I can try to investigate. Speaking of which, is there any reason your CMakeLists requires cmake 3.4.1?
The BLT infrastructure I'm referring to is the rest of that Payday-2-BLT /src folder. My understanding of BLT is that it exposes some additional features that may require some work for the Linux side.
@Ozymandias117
RET
mod, and PAYDAY boots, but is slow as molasses. I'll remove the lua_newstate
hook I added, but I fear that blt4l's game update hook is the cause of the lag. Can you please write (and test) a short POC that demonstrates how you hooked it in such a manner that the game was playable? Any other details would be greatly appreciated.BTW lets carry this dialog over to #5
Currently, SubHook uses the JMP opcode E9 in order to perform a relative jump to the target function.
Due to the way that BLT4L is loaded, our functions live in address space that is extremely far from the LUA function calls we need access to.
The actual jump offsets we would need to use are larger than a signed 32-bit integer would allow. Unfortunately, 0xE9 expects a signed paramater in order to support backward jumps.
Inside
subhook_make_jmp
, the result of the offset calculation is cast to (int32_t) which truncates the jump value, and can result in a negative jump when a positive jump is needed, not to mention that the jump offset is far to small on the order of several magnitudes.