Open andreafioraldi opened 4 years ago
We should start by simply supporting case 1.
Great, what multi threaded target app would you propose for developing/testing?
IDK, maybe a web server not too huge
"good first issue" is a lie ;)
I think we can achieve it by walking all threads on take_snapshot()
.
For example, we can use walk_process_tree()
pushing as CB our snapshot function, which will dump all needed things for every thread from top one.
But, the question is how to deal with threads death and born?
I mean if we dump when target got two threads -- and roll back when it got two same threads with same pid, etc -- no problems to do that.
But, if one of them reach do_task_dead()
or do_exit()
-- it's resources were freed.
The same thing if new thread was born -- when we rolling back -- what should we do with this resources?
I think, that first task we can solve via hooking exit functions and check if exit target is our client. If so -- we can just unlink it in pid struct. ( pid struct * + 0x08 offset on old kernels. https://elixir.bootlin.com/linux/v4.19.160/source/include/linux/pid.h#L62 ) After that it should disappear from process tree, and will become fully invisible for whole system: procfs, syscalls, even for some kernel functions (last can be little trouble for our-self) . But it continue scheduling. To prevent this we can freeze it somehow, IDK how yet, but I think it's not the most hard part.
What about borned threads -- I think we can just terminate them. But I think we should restore stack and memory state of father thread in this case. To prevent sync troubles and false-positive crashes?
So, we'll have:
1) Dump of main thread and each child thread at the moment of snapshot creating
2) Threads who decide exit after snapshot was taken -- became frozen threads with valid task_struct, mm, vma, pid, etc...
3) New threads, or not.
We just need to have detailed list of them.
And so, when recover_state()
is called we do next:
1) lock & pause main task, all child task.
2) check if there are some threads we don't know about (weren't present when we doing snapshot)
3) terminate them. (don't know how, yet. we can do a lot's of stuff with kernel_thread
when it on pause)
4) check if there are some threads, which was hooked and frozen on exit attempt
5) restore state of all threads one by one.
6) release pause & unlock.
7) Fuzz multi-threaded web server and be happy?
One more question is design of calling afl_take_snapshot()
. I mean why we MUST call this from target thread? Or why we MUST call afl_recover_snapshot()
from target thread?
We can just send pid_nr
of target thread to ioctl from AFL, or from forkserver, extract task_struct and do same things for that tasks.
P.s.
if someone wants to pick this issue feel free to do that but before comment here.
I fix some things and switch to ftrace current branch. Already send PR. Check it, please, when have some time. Thanks.
I want to add the support to snapshotting the state of all threads. There are 2 cases:
Case 1 is simple, we just terminate thread B when A does the snapshot restore. Case 2 has 2 subcases:
2.1. thread B is still alive when A does the restore 2.2. thread B is already dead when A does the restore
For 2.1 we stop thread B, restore the context and restart it. For 2.2., we hook thread exit and instead of letting B to exit before A, we pause it marking it as waiting for restore. When A calls restore, we restore also the context of B and restart it.
The current implementation work at task_struct level and does not support this. I will code this eventually, not enough time ATM, if someone wants to pick this issue feel free to do that but before comment here.