Open jclark opened 3 years ago
@jclark, when we are adding the statepoint intrinsic for call/invoke instructions to track the GC roots in StackMap section, what we can do is running the opt command (with --rewrite-statepoints-for-gc
flag). But in gollvm, they have written a pass called go-statepoints which is same as rewrite-statepoints-for-gc in llvm.
What can we do for our case?
I don’t understand the question. Why can we not use opt —rewrite-statepoints-for-gc
for now? I guess eventually we will have to add our own pass. This is what you need to figure out…
When I was testing the statepoints, I also used opt —rewrite-statepoints-for-gc
. When I am looking at the gollvm source code I saw they are using a different pass for that. That is why I need to ask whether we use the command-line option or a different pass in our initial implementation.
Now I understood.
How are we going to deal with derived pointers?
If we define a derived pointer and we use that pointer after the statepoint, RS4GC pass(--rewrite-statepoints-for-gc) identifies that derived pointer and rematerialize. Please check the following example.
Before run RS4GC:
define void @dummy_func() { ret void } define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj) gc "statepoint-example" { %gep = getelementptr i8, i8 addrspace(1)* %obj, i64 20000 call void @dummy_func() %p = getelementptr i8, i8 addrspace(1)* %gep, i64 -20000 ret i8 addrspace(1)* %p } define dso_local void @main() #0 { %1 = alloca i8*, align 8 %2 = call noalias i8 addrspace(1)* @malloc(i64 8) #3 %3 = call i8 addrspace(1)* @test1(i8 addrspace(1)* %2) ret void } declare dso_local noalias i8 addrspace(1)* @malloc(i64) #1
After running RS4GC (changed method is pasted):
define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj) gc "statepoint-example" { %gep = getelementptr i8, i8 addrspace(1)* %obj, i64 20000 %statepoint_token = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @dummy_func, i32 0, i32 0, i32 0, i32 0) [ "gc-live"(i8 addrspace(1)* %obj) ] %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %statepoint_token, i32 0, i32 0) ; (%obj, %obj) %gep.remat = getelementptr i8, i8 addrspace(1)* %obj.relocated, i64 20000 %p = getelementptr i8, i8 addrspace(1)* %gep.remat, i64 -20000 ret i8 addrspace(1)* %p }
Oh I see. That's clever. We need to make sure this works with our tagged pointers. Can you check that it works with the ptrmask intrinsic?
Is there a way to control which functions get treated as safepoints and rewritten via RS4GC? I think we will want to not treat some of the calls to langlib functions as safepoints.
This relates to another problem. How to deal with GC roots in the stack frames from runtime functions written in C?
Oh I see. That's clever. We need to make sure this works with our tagged pointers. Can you check that it works with the ptrmask intrinsic?
RS4GC identifies the output of ptrmask intrinsic as a gc-live pointer and it is added to the gc.statepoint intrinsic. Please check this LLVM IR, generated via RS4GC for the below example.
import ballerina/io;
public function main() {
any[] v = [1234567890123456789];
io:println(v[0]);
}
Is there a way to control which functions get treated as safepoints and rewritten via RS4GC? I think we will want to not treat some of the calls to langlib functions as safepoints.
Yes, we can do it by using "gc-leaf-function"
.
Ex:
declare i8 addrspace(1)* @_bal_int_to_tagged(i64) "gc-leaf-function" ; _bal_int_to_tagged is not a statepoint
This relates to another problem. How to deal with GC roots in the stack frames from runtime functions written in C?
I also need the think about this further. But the current approach I came up is explained below. The output pointer value of the runtime function is a gc-live pointer. So it is available in gc.statepoint intrinsic. When we are marking the roots, we know the value of that pointer and using that we can find the actual root.
For example let's say we call _bal_mapping_construct
and we get a tagged pointer. So this is a gc-live pointer but it does not represent an actual heap allocation. But we can remove the tag and mark it as a root. (i.e. ((uint64_t)p0 ^ (uint64_t)TAG_MAPPING_RW << TAG_SHIFT))
For example let's say we call
_bal_mapping_construct
and we get a tagged pointer. So this is a gc-live pointer but it does not represent an actual heap allocation. But we can remove the tag and mark it as a root. (i.e. ((uint64_t)p0 ^ (uint64_t)TAG_MAPPING_RW << TAG_SHIFT))
This is the same as with stack frames from our compiler: address space 1 pointers may or may not have a tag. I don't think that's a problem.
The problem I am concerned about is something like _bal_mapping_set. Suppose it needs to grow the array (calling _bal_array_grow). Suppose this triggers a garbage collection. This may cause the heap objects referenced by the key
and value
parameters to be moved. Now when _bal_mapping_set accesses its key
and value
parameters after _bal_array_grow, it will be accessing the wrong memory locations.
A simpler case is _Barray_push
.
https://github.com/KavinduZoysa/test-GCs/tree/find-gc-roots, Please find the progress of the issue in this repo
Two alternative strategies for https://github.com/ballerina-platform/nballerina/issues/165#issuecomment-884145261
add_gc_root(context, &ptr)
gc "strategy"
) to look like the code that we generate from .balIn any case, I think we will want to distinguish between our C runtime functions that can trigger (or allow) a garbage collection and those that don't. The former are the ones we need to worry about. The latter can be declared as "gc-leaf-function" and don't need any special treatment. Functions that can themselves call _bal_panic need to be in the former category.
I should say that I definitely prefer strategy 2 if we can make it work, since it makes the C code consistent with the code we generate, and should be more reliable.
I should say that I definitely prefer strategy 2 if we can make it work, since it makes the C code consistent with the code we generate, and should be more reliable.
I have started to develop a working example. Please find the progress here. https://github.com/KavinduZoysa/test-GCs/tree/find-gc-roots/solution2
https://github.com/KavinduZoysa/test-GCs/tree/79d28f9836c0ed0feb9e4bf864ba1217a636b06b
Here I have tested the identifying roots for a runtime function according to the 2nd approach of https://github.com/ballerina-platform/nballerina/issues/165#issuecomment-884854477. These are the steps I followed.
A few comments:
With approach 2, clang is processing address space 1 without knowing that it is non-integral. I am concerned that this may mean that clang does optimizations that will make the stack map not correct (this is why we have non-integral address spaces). If we turn off optimization, then it gives functions a noinline attribute. Maybe we should turn off optimization but remove or otherwise override this attribute.
Shouldn't we be combining stack maps at link time?
I thought we were going to use https://github.com/kavon/llvm-statepoint-utils/tree/master/test
- With approach 2, clang is processing address space 1 without knowing that it is non-integral. I am concerned that this may mean that clang does optimizations that will make the stack map not correct (this is why we have non-integral address spaces). If we turn off optimization, then it gives functions a noinline attribute. Maybe we should turn off optimization but remove or otherwise override this attribute.
+1
- Shouldn't we be combining stack maps at link time?
If we check the object dump of final executable we can se there are two stack maps.
- I thought we were going to use https://github.com/kavon/llvm-statepoint-utils/tree/master/test
Yes, I am using that to generate the hash table and lookup frame info for the given return address https://github.com/KavinduZoysa/test-GCs/tree/find-gc-roots/solution2/runtime/gc. Also, I am using the same approach to get the rsp https://github.com/KavinduZoysa/test-GCs/blob/find-gc-roots/solution2/runtime/shim.s.
If we check the object dump of final executable we can se there are two stack maps.
Right. So we would need to do extra work at link time to combine them. Can you look at how Julia does this?
Right. So we would need to do extra work at link time to combine them. Can you look at how Julia does this?
Sure, I will work on that.
Julia's GC strategy is similar to shadow stack and it does not use LLVM's GC framework, which means they do not use LLVM stackmaps to identify GC roots. Therefore we cannot gather information related to LLVM stackmaps from Julia.
The other thing is, how does Julia handle the roots in C functions. Form C side it calls method JL_GC_PUSHx() to track the roots.https://docs.julialang.org/en/v1/manual/embedding/#Memory-Management
Interesting though that Julia is using non-integral pointers in LLVM code, plus something explicit in C code. Does Julia have a moving GC? I guess not.
The shadow stack thing is interesting: I wonder if that might be a good way to get started with GC.
Interesting though that Julia is using non-integral pointers in LLVM code, plus something explicit in C code. Does Julia have a moving GC? I guess not.
They are using custom address spaces(10 - GC tracked pointers, 11 - Derived pointers, etc) and non integral address spaces to keep the GC related information safe, even after the program passes through optimizations(This point is explained here). They have a GC root placement LLVM pass at the end of their pass pipeline. In that pass, they are creating struct call gc frame to hold in roots informations. In C side JL_GC_PUSHx() method is doing the same thing (create gc frame) https://groups.google.com/g/julia-users/c/7yW0zqiTwc4.
Yes, it is a non-relocating GC.
The shadow stack thing is interesting: I wonder if that might be a good way to get started with GC.
AFAIU, shadow stack approach and statepoint approach we are using currenlty, are completely different approaches. So in the future, if we need to move statepoint approach again we have to spend some considerable time on that. But in the document, it is mentioned that, shadow stack is an easy way to get started.
Please have a look at #300, to see how this fits in, in particular point 7.
I think we've done enough exploring. It's time to write the code to resolve this issue. I suggest you start a new branch "gc". Can you make statepoints work for both C and LLVM-generated code? If so, we should do that. When you've done this, then go on to #300. We will need some test programs that do a lot of allocation and deallocation. This will determine whether it's all working. I suggest you focus on getting it working reliably first (including #300). After that, we can focus on getting the code quality to where it needs to be. Then we can merge the branch in.
I will go to #300.
@jclark, when enabling GC for runtime functions, first I selected list.c and I am trying to run the following commands.
clang-11 -O2 -S -emit-llvm -o list.ll list.c
sed -i -e '/define /s/ {/ gc "statepoint-example" {/' -e '/target datalayout/ s/"$$/-ni:1"/' list.ll
opt-11 -S -O2 --rewrite-statepoints-for-gc -o list_opt.ll list.ll
llc-11 -O2 --filetype=obj -o list.o list_opt.ll
But opt command crashes and it is unable to generate statepoints. I ran opt-11 -S -O2 -o list_opt.ll list.ll
and output LLVM IR is attached here.
I went through the llvm IR and I found that compiler crashes at L60-L61. I am working on finding the root cause to fix this issue.
I thought the plan was to use -O0 with -emit-llvm and then edit out the noinline attribute.
Also we should mark functions as leaves where appropriate.
If we still get LLVM crashes, then we should try an alternative approach. At this stage, we do not want to get into debugging LLVM.
I thought the plan was to use -O0 with -emit-llvm and then edit out the noinline attribute.
I tried this and LLVM has crashed again. Then I marked noinline for taggedToPtr
. Then it works without crashing LLVM. The steps I followed in breif:
clang-11 -S -emit-llvm -o list.ll list.c
taggedToPtr
sed -i -e '/define /s/ {/ gc "statepoint-example" {/' -e '/target datalayout/ s/"$$/-ni:1"/' list.ll
Also we should mark functions as leaves where appropriate.
Yes, I will add
taggedToPtr should really be done with the ptrmask intrinsic. See #201.
So really what we should do is:
So please fix #201 and submit a pull request to main.
Based on #149 and #164, we need to build the capability to identify at runtime the GC roots on the stack.
Concretely, we want to add a C file to the runtime directory that provides a
markRoots
function that can be used by our garbage collector:Not sure if this is the best API. We should probably look MMTK and figure out something that is a good fit.