RiS3-Lab / kubo

Use on-demand control- data- flow slicing combined with taint analysis and symbolic execution to produce scalable and precise UB detection for Linux kernel.
22 stars 3 forks source link

group child modules to a full module——vmlinux.bc #5

Open boti-li opened 3 years ago

boti-li commented 3 years ago

hello When i use the compile tools of kubo to compile the kernel , I come across some problems. I find built-in.ll of each module in the corresponding index, but it seems that you do not generate a full module for the whole linux project which like wllvm do , I want to generate a full module ——vmlinux.bc for kernel with built-in of each child module.

But for modules under the drivers/... , kubo may not generate the correct bitcode for them. When i look into these modules , i find drivers/staging/.... cause for it. As we all known, immature implenmentation which quality is unknown is stored here, so i just ignore them and link other modules with llvm-link to generate the vmlinux.bc.

However , this link operation adds some bitcast instructions to functions because suffix are added to structure types during the link, therefore when kubo execute these paths ,it will crash. How can I generate vmlinux and avoid this? My kernel version is 5.4.1

the example: %call51 = call i32 @unshare_nsproxy_namespaces(...) #8
which change to %30 = bitcast i32 (i64, %struct.nsproxy.49395, %struct.cred.49401, %struct.fs_struct)* @unshare_nsproxy_namespaces to i32 (i64, %struct.nsproxy.42579*, %struct.cred, %struct.fs_struct.42465) %call35 = call i32 %30(i64 %6, %struct.nsproxy.42579 nonnull %new_nsproxy, %struct.cred null, %struct.fs_struct.42465 %29) when kubo execute this node , it will crash.

Besides, is the clang in the llvm/kubo-bins-9.0/build/bin your modified frontend compiler? I specify this compiler to wllvm to extract vmlinux.

Lawliar commented 3 years ago

when kubo execute this node , it will crash.

May I ask what is the error message of the crash? I'm not so sure if kubo can work properly on wllvm(maybe some special instrumentations are added by wllvm?)

Besides, is the clang in the llvm/kubo-bins-9.0/build/bin your modified frontend compiler?

Yes, more specifically, when compiling the Linux kernel, I used the modified clang; when running kubo, I used the pre-built clang from llvm(because it's faster).

boti-li commented 3 years ago

May I ask what is the error message of the crash? I'm not so sure if kubo can work properly on wllvm(maybe some special instrumentations are added by wllvm?)

The log is to long to attach here , I simply introduce it for you. As the example show above ,the link operation( extract -bc vmlinux of wllvm) will add bitcast insruction to the raw IR code. here is the node kubo crashed at : IR(5) : br i1 %tobool5.i, label %unshare_fd.exit, label %unshare_fd.exit.thread

before link operation the bb in the built-in is : BB(14)if.then.i IR(1) : %call4.i = call %struct.files_struct @dup_fd(%struct.files_struct nonnull %22, i32 nonnull %error.i) #8 IR(2) : %26 = ptrtoint %struct.files_struct %call4.i to i64 IR(3) : %tobool5.i = icmp eq %struct.files_struct %call4.i, null IR(4) : %27 = load i32, i32 %error.i, align 4 IR(5) : br i1 %tobool5.i, label %unshare_fd.exit, label %unshare_fd.exit.thread

after linking, it becomes : BB(14)if.then.i IR(1) : %30 = bitcast %struct.files_struct (%struct.files_struct, i32) @dup_fd to %struct.files_struct.3073 (%struct.files_struct.3073, i32) IR(2) : %call4.i = call %struct.files_struct.3073 %30(%struct.files_struct.3073 nonnull %26, i32 nonnull %error.i) #9 IR(3) : %31 = ptrtoint %struct.files_struct.3073 %call4.i to i64 IR(4) : %tobool5.i = icmp eq %struct.files_struct.3073 %call4.i, null IR(5) : %32 = load i32, i32 %error.i, align 4 IR(6) : br i1 %tobool5.i, label %unshare_fd.exit, label %unshare_fd.exit.thread

obviously, the added bitcast instruction accounts for the crash because kubo can't handle this case

And the log when crashing shows errors came out in SEGraph::mergeSEG(...), because cannot handle the node above. I locate it by this message: unhandled merged node: br i1 %tobool5.i, label %unshare_fd.exit, label %unshare_fd.exit.thread and then it will execute calleeGraph->displayForDeps() before crashing .

But I almost give up solving the problem above which needs to deep into symblize implentation . (ಥ﹏ಥ)

I have question that since you can group each single bc file in a folder to a built-in.ll , why don't you group the built-in.lls to a whole module ? It will be more helpful. If you can make it , can you teach me the method? And could you tell me more about how to merge the traces of indirect call in the coding level when doing interprocedure analysis? Thanks a lot!

Lawliar commented 3 years ago

unhandled merged node: br i1 %tobool5.i, label %unshare_fd.exit, label %unshare_fd.exit.thread and then it will execute calleeGraph->displayForDeps() before crashing .

I see, this is weird, branch instruction is certainly a SENodeInst, I wonder why that would cause the crash.

But I almost give up solving the problem above which needs to deep into symblize implentation . (ಥ﹏ಥ)

You might be right, the implementation of mergeSEG is not my proudest code and is kinda messy. With these said, if you can send me the .bc file(hopefully small enough to share) which triggers this crash, I can have a look.(I might not be able to 100% guarantee that I can fix it given the bandwidth of mine and the fade of memory : p )

I have question that since you can group each single bc file in a folder to a built-in.ll , why don't you group the built-in.lls to a whole module ?

since technically to what extent you wanna group is just a matter of a few more(or less) steps of linking, so I guess it's doable(although I have not tried it myself). I guess the reason for this level of grouping here(which is "submodule" level) is mainly because it is "enough" i.e., the function calling is mainly within each submodule(need some hard facts to back this up) and many other implementations did this(both SE on the linux kernel as well as other analysis e.g., call graph), so I just didn't bother to take it to another level. Would be interesting to see if it works on the whole module though.

And could you tell me more about how to merge the traces of indirect call in the coding level when doing interprocedure analysis?

Conceptually, indirect calls are handled just like direct calls(both call relations are handled and inputed from the call graph analysis), except indirect calls have multiple callees.