alexandernst / monks

Procmon alternative for Linux
71 stars 34 forks source link

Use stub funcitons for the un/hook process #31

Closed alexandernst closed 10 years ago

alexandernst commented 11 years ago

The current hook method will block the module's unloading till the last sleeping process issues a call to all the syscalls it used at some point of it's execution. Using stubs should fix the situation, as per comments in #17 .

alexandernst commented 10 years ago

Hi @milabs ! Have you managed to get some free time and work on this? How is it going?

milabs commented 10 years ago

@alexandernst Not really, I'm sorry man :(

alexandernst commented 10 years ago

@milabs Don't worry :wink: Do you plan working on it or should I take it? If the second, can you give me some tips (maybe links) to some docs about how to implement this?

milabs commented 10 years ago

@alexandernst It may be the best in that case. Feel free to ask questions ;)

alexandernst commented 10 years ago

@milabs Ok :) So, I had a reading session and I didn't get a lot of things clear. A stub function is an empty function basically. Or a function that returns a known value, which is useful for debugging. I'm not sure how I'd use a stub function here.

alexandernst commented 10 years ago

Maybe http://stackoverflow.com/questions/10405436/anonymous-functions-using-gcc-statement-expressions ?

milabs commented 10 years ago

@alexandernst Stub is the piece of code that you need to cook. I think that you'll need to write some pattern in assembly (like call 0 // call 0 // call 0 // ret, see the #17 ). Next, you'll need to make a stub's copy for each syscall and replace zeroes with the proper values... udis86 is very usefull as you can know :)

alexandernst commented 10 years ago

@milabs Oh, ok. I think I start understanding. Next question: who allocates the memory that will hold the stub? And what happens when the module is unlodaded? Will that memory stay "occupied" forever? (until next reboot, ofc)

alexandernst commented 10 years ago

@milabs Hmm, and yet another thing. That stub is just some ASM calling the functions from my module (from #17, sys_read_post_hook_action), which won't exist anymore when I unload the module.

Perhaps I should create a stub (as in executable memory area) and place there the entire sys_read_post_hook_action function, right?

milabs commented 10 years ago

When unloading you'll need to change calls with the NOPs. That prevents the system to follow unloaded function. And you are right, the memory doesn't be freed =) And another one thing. Take a look at the stop_machine interface. It helps us to do big things like nop'ping the stubs atomic.

alexandernst commented 10 years ago

Ok, more questions :) How can I create a executable memory area? Is there anything in the kernel that will allow me to (remotely) do that?

milabs commented 10 years ago

@alexandernst I've used module_alloc function a time ago. It's not exported but that don't stop me from hacking :) You can start reading that function and if you invent how simply create executable memory I'll be happy :)

http://lxr.free-electrons.com/source/arch/x86/kernel/module.c?v=3.8#L46

alexandernst commented 10 years ago

Yaiks, an un-exported function :/, maybe I'll have to write my own function (copying module_alloc) to avoid old/future changes in the kernel. Ok, I think that's for now, I'll try to write a POC. If I get stuck again (most probably) I'll ask you :) Thank you!

alexandernst commented 10 years ago

Wait... I think... wouldn't be just kmalloc with `GFP_KERNEL | PAGE_KERNEL_EXEC`` enough?

alexandernst commented 10 years ago

Ah, no, sorry, no such flag in kmalloc, instead __vmalloc(byte_size, GFP_KERNEL, PAGE_KERNEL_EXEC); should mimic perfectly what that function is doing. Anyways, I should get some sleep now (1am here). I'll play with that tomorrow and let you know if I get stuck :smile:

alexandernst commented 10 years ago

After thinking about it for a few hours I think I have all the steps:

  1. Load module
  2. Create a stub like the following one:

    stub: 
    CALL sys_read_pre_hook_action
    CALL real_sys_read
    IF <some conditions>
       call sys_read_post_hook_action
    IF <counter for remaining syscalls calls == 0>
       restore original syscall address in the syscall table
       free <this stub>
    RET
  3. Replace the original syscall address in the syscall table with the address of the stub we just created.
  4. Do some stuff.
  5. Replace with NOPs lines 1, 3 and 4 from the stub.
  6. Unload the module without free-ing the stub.

I'd need to create some kind of macro/template for creating those stubs, as I'll have one for each syscall. What do you think? Am I missing something?

alexandernst commented 10 years ago

@milabs Ok, I got to another mental-block. Can you help me?

So, let's say I create the stub like this:

CALL real_sys_read
CALL sys_read_post_hook

This will work perfectly, as when I unload the module, I'll just change to stub to:

CALL real_sys_read
NOP NOP NOP NOP NOP

Then that stub will stay in memory till the next reboot. So far so good. But now, I'd like to improve it. I'd like to make the stub free itself. For that to happen I need to keep the current __INCR and __DECR macros and create the stub as I already said in my last comment.

The first line of the stub will call the __INCR macro, then the second line will call the real syscall, and then I'd do some checks to see if I should call the fake syscall or free the stub itself.

Let's have a look at the __INCR macro:

#define __INCR(F) atomic_inc(&__syscall_info___NR_##F.counter);

That's pretty clear. A single line that will make an atomic increase of the value of the syscall struct. For that to keep working I need to

a) allocate __syscall_info___NR_##F in memory (which is really easy) and b) allocate the macro in memory, which I have no idea how to do.

My question is: How can I allocate the macro __INCR in memory (in a stub, like the one I'm already creating) ?

alexandernst commented 10 years ago

Oh, I think I just found a way (and my question wasn't that smart anyways!) :smile:

alexandernst commented 10 years ago

@milabs Hi again! Do you know any library/thing that will let me generate binary code out of ASM in runtime? (so I can gen that code and memcpy it to the stub)

milabs commented 10 years ago

@alexandernst Do you really need this??

alexandernst commented 10 years ago

@milabs Hmmm... Maybe I'm not asking for the right tool. But then, I'd like to be able to let the stub know about the address of the atomic counter from here https://github.com/alexandernst/procmon/commit/1db42c3c61d5b93be86ee914ca19e5933b4ead48 so the stub can know when to free itself. How could I do that?

milabs commented 10 years ago

@alexandernst Write stub in assembly and then use udis86 to fixup the refs.

alexandernst commented 10 years ago

@milabs Ok, I think I'll manage to do that. :smile:

alexandernst commented 10 years ago

@milabs I thought it would be easier, but even a simple "Hello world" with opcode won't run and it will just trigger a kernel oops. I wrote a simple demo and asked in SO: http://stackoverflow.com/questions/20430835/running-code-inside-executable-memory Can you give me a hint, please?

milabs commented 10 years ago

@alexandernst Still have no answer?

alexandernst commented 10 years ago

@milabs I'm almost there. The only missing thing is how to do an indirect call (E8 xx xx xx xx holds 4 bytes at most, which means not all addresses can be called).

alexandernst commented 10 years ago

@milabs Ok, I finished the POC code to generate some opcode, http://pastebin.com/CWNhruDG Anyways, after loading the module, it generates this opcode: 48 bf 24 00 18 a0 ff ff ff ff 48 bf 2d 00 18 a0 ff ff ff ff 48 c7 c0 02 00 00 00 48 ba ab 05 6c 81 ff ff ff ff ff d2 c3 (addresses may vary, of course), which udcli disassembles as:

mov rdi, 0xffffffffa0180024
mov rdi, 0xffffffffa018002d
mov rax, 0x2
mov rdx, 0xffffffff816c05ab
call rdx
ret

which is correct. That's exactly what my original code looked as. Anyways, it won't work. It won't do anything at all. I mean, the entire output caused by the module is:

[  704.004855] hello: module license 'unspecified' taints kernel.
[  704.005315] &printk: ffffffff816c05ab
[  704.005320] Bytecode: 
[  704.005323] 48bf240018a0ffffffff48bf2d0018a0ffffffff48c7c00200000048baab056c81ffffffffffd2c3
[  704.005323] End

The "Hello world!" message is missing! Why? Why isn't my code running? Or maybe it's running but it isn't causing any output?

milabs commented 10 years ago

@alexandernst x86_64 calling conventions supposes that function args is in regs RDI, RSI, RDX and RCX. You code must looks like this:

// printk("\n\n\n%s\n\n\n", "hello world"); mov rdi, offset of ("\n\n\n%s\n\n\n") mov rsi, offset of ("hello world") mov rax, &printk call rax

http://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions

alexandernst commented 10 years ago

@milabs What exactly is the offset in your example code?

offset = \<variable addr\> - <(current address + 5)> ?
milabs commented 10 years ago

@alexandernst No, as RDI and RSI are 64 bit registers, offset is not relative. Offset is the variable address. Think about the CPU. It fetches instructions one by one. Relative RIP addressing means that the address is relative to RIP pointer. But the CPU doesn't know anything about the instruction before it fetched. After that, RIP points to the next instruction and all relative offsets related to that RIP.

alexandernst commented 10 years ago

@milabs Hmmm, ok, I'll try it as soon as I get home (at the office now) :smile:

alexandernst commented 10 years ago

@milabs It works !!!!!!!! :smile: Now I need to get a simple "Hello world" for x86 (which I don't think will be any different) and then start coding the real part.

milabs commented 10 years ago

@alexandernst Excellent :) Tell that to all the SO peoples :)

alexandernst commented 10 years ago

@milabs I was re-reading the calling conventions and I have 2 questions.

First question:

In x64: Userland uses RDI, RSI, RDX, RCX, R8, R9, if there are more than 6 arguments, the stack is used too. Syscalls uses RDI, RSI, RDX, R10, R8, R9, if there are more than 6 arguments, the stack is used too.

In x86: Userland uses stack for all arguments Syscalls use EBX, ECX, EDX, ESI, EDI, EBP, if there are more than 6 arguments, the stack is used too.

Have I understood the docs right?

Second question: Are there syscalls with more than 6 arguments? Which ones?

alexandernst commented 10 years ago

@milabs Look at https://github.com/alexandernst/procmon/commit/d50d2e16bcde7b6d0686bca001a677d05e1e3aa7 I'm almost done!! :smile: :smile: :smile:

I'm only missing the unhook part, which can be done in two different ways.

The first way, which is the less eficient, is to completely remove the fake syscall call and leave only the real syscall call. This way procmon will waste around 60bytes for each syscall. Not much, but feels kind of dirty.

The second way is to check inside the stub if the atomic counter has reached 0, and if so, do 3 things. a) restore the original syscall address b) kfree itself c) place the result of the last syscall in eax/rax.

"a" shouldn't be that hard to do, even in plain ASM. Problem is, how to kfree the stub itself, and also make it finish running itself so it can place the result of the syscall.

BTW: If I go with method 1, we won't need atomic inc/dec anymore! RIght?

milabs commented 10 years ago

@alexandernst Great!

First way, I think. And you can use a single memory area for all the stubs as you always known amount of the hooked calls. Just preallocate the memory and split it later.

As for the second way, I think that it's too complex and doing kfree itself is not a good idea..

And one more thing. Why do we needed a counter for each hooked syscall and not the generic one?

alexandernst commented 10 years ago

@milabs Hi! Sorry for taking me so long to reply!

Ok, I'll reconsider this in a future version maybe. :)

Well... an individual counter (per syscall) is needed because there are some syscalls that can be "restored" immediately, but others can't (like __READ). Anyways, actually it doesn't matter if some of the syscalls can be restored earlier than others because right now the entire module is kept in memory until all of the syscalls are restored. And when I merge the new branch, I won't need any of the counters at all :)

alexandernst commented 10 years ago

@milabs It's done !!!!!!!!!!!!! I made it!!!!!!!!!!!! :D:D:D It took me almost 2 months of work and +70 commits, but I finally made it! Thank you for all the tips and help :smile:

milabs commented 10 years ago

@alexandernst Great work!