Open magnatelee opened 5 years ago
Is there a work around for this issue in the short term? The runtime warnings are really annoying.
Th runtime warnings should all be off by default now unless you explicitly enable them.
I want to make a note that many of the Regent tests are also leaking layout constraints.
Aren't our layout constraints all static? How exactly are we supposed to clean them up? Do we need a cleanup think to clean up the results of our registration thunk?
It's a consequence of field spaces not being destroyed; field spaces are holding references to layout constraints, so until they are destroyed, those references are still alive.
I think we fixed this a long time ago?
Do you want to try turning on the runtime checks and confirming that? 😉
We've been through several rounds of that and have fixed various issues along the way. I don't think we run legion_gc.py
in CI (as far as I know) but I believe we've cleared several large applications of leaks and am not aware of any outstanding ones...
I didn't mean legion_gc.py
. I meant running with the -lg:leaks
flag which will give you a warning if you leak any resources that you created. If you convert warnings to errors you'll also get hard crashes in the CI if you are leaking anything.
If you convert warnings to errors
How do you do this?
Build with -DLEGION_WARNINGS_FATAL
I can't actually run the Regent test suite in this mode because it causes warnings like the following to turn into errors as well:
$ ./regent.py tests/regent/run_pass/call_task_polymorphic1.rg
[0 - 7ff84f868100] 0.000151 {4}{threads}: reservation ('utility proc 1d00000000000000') cannot be satisfied
[0 - 70000b5c6000] 0.103747 {4}{runtime}: [warning 1071] LEGION WARNING: Region requirement 0 of operation t1 (UID 3, provenance: tests/regent/run_pass/call_task_polymorphic1.rg:91) in parent task main (UID 2) is using uninitialized data for field(s) 102,103 of logical region (1,1,1) (from file /Users/elliott/Programming/Legion/legion/runtime/legion/legion_ops.cc:1838)
For more information see:
http://legion.stanford.edu/messages/warning_code.html#warning_code_1071
What I want is to make only the warnings from -lg:leaks
into errors.
Anyway, there is a change in https://gitlab.com/StanfordLegion/legion/-/merge_requests/1257 which cleans up the leaks I am currently aware of (and have had the patience to manually check, since I cannot rely on an automated test suite to confirm).
Actually it's worse than that because I have to build with -DLEGION_WARNINGS_FATAL
to get the list of tests that might be hitting leak errors, then recompile without -DLEGION_WARNINGS_FATAL
so that I can actually run them to completion to see if they actually leak (because if hit some other warning-turned-into-an-error first, I can't know if the code would have leaked if it actually ran to completion, because execution is terminated prematurely).
So I think I cannot validate the current situation at all except through extensive manual effort, which I am not willing to do right now.
You can manually turn the warnings in this function into errors to get the behavior you want: https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/legion/legion_context.cc?ref_type=heads#L6334-6523
I don't think it's a common enough use case to do this generally for everyone.
Just for posterity, the issue I am seeing now (even without leak checks or hard errors) is that index spaces are being deleted too eagerly. A sample reproducer and some explanatory comments are available here:
https://gitlab.com/StanfordLegion/legion/-/merge_requests/1257#note_1901183719
I've removed the index space requirement checks in the inordercommit
branch since they don't do anything anymore there. That branch should make it into the master branch later today.
I think that https://gitlab.com/StanfordLegion/legion/-/merge_requests/1257 is ready to go now, just waiting for users to confirm that it doesn't break anything.
Good to go from my side if you are happy with it. The inordercommit
branch has merged.
There are two major caveats currently remaining:
For example, S3D has code that looks like:
task make_partition(...)
var part = ...
var part_color = c.legion_index_partition_get_color(..., __raw(part))
return part_color
end
The compiler has a very simple escape analysis that would detect if part
were returned directly. But because we're getting the __raw
value and passing that into a C API call, and then returning the color, I see basically no reasonable way for the compiler to determine that there's a dataflow here.
I see two possible solutions:
__raw
always escapes. If you ever use __raw
for any purpose, we'll leak the object.__leak
operator to explicitly leak objects so that we do not collect the partition at the end of the task. This means that some user programs (like S3D) will need to be modified to account for this change.This can get really subtle because Regent has to create implicit index spaces in some cases. Here are some examples of code that creates index spaces implicitly:
for i = 0, N do
c.legion_begin_trace(...)
__demand(__index_launch)
for j = 0, M do
...
end
c.legion_end_trace(...)
end
Regent has to create an index space [0, M)
for the j
loop in order to do the index launch. Previously, we simply leaked all such index spaces. Now we collect them once the launch is issued; but this means that we now cannot trace the loop. This has to be fixed by creating an index space explicitly:
var M_space = ispace(int1d, M)
for i = 0, N do
c.legion_begin_trace(...)
__demand(__index_launch)
for j in M_space do
...
end
c.legion_end_trace(...)
end
This is obnoxious but at least reasonably straightforward. But here's a case where the reason why the index space gets created is pretty non-obvious:
for i = 0, N do
c.legion_begin_trace(...)
some_task_launch(...)
c.legion_end_trace(...)
end
Do you see the index space? No?
It's on the very first line for i = 0, N
. The reason is that when Regent launches some_task_launch
, it tries to be helpful and assumes that [0, N)
is the sharding space for the launch. This is maybe not the best choice in this case, but it makes more sense in cases where we have inner loops that cannot be turned into index spaces: at least Regent can associate a sharding space and point to each individual task and more or less get the right distribution.
I don't see a way to fix this without some heavy-duty LICM in the compiler. And worse, because the user can call C API methods directly for the tracing, we don't even necessarily know that the code is going to be traced.
This could also be fixed by the runtime supporting index space deletion inside traces, but I don't know if that is anywhere on the horizon.
This could also be fixed by the runtime supporting index space deletion inside traces, but I don't know if that is anywhere on the horizon.
I'm not going to support deletion inside traces because those traces are immediately non-replayable forever because you can't delete the same index space (or any resource really) twice.
I don't see a way to fix this without some heavy-duty LICM in the compiler.
Why don't you just move all deletions of implicit index spaces to the very end of a task body? I realize that might create a sub-optimal lifetime for some resources (especially in long-running tasks), but it would be sound. It's like when C++ implicitly destructs objects on the stack at the end of a scope.
the very end of a task body? [...] It's like when C++ implicitly destructs objects on the stack at the end of a scope.
You're right about it being sound (in the part of this quote I elided), but there is a direct contradiction here: (a) end of task body vs (b) end of a scope.
Regent is currently attempting to do (b), just like C++ with RAII. A temporary object has no scope so it gets destroyed immediately. Thus the issue.
I could do (a), but I'm not sure if that's really better. It does allow tracing but you're accumulating a potentially unbounded number of index spaces, which is still a de facto leak (even if they'll all be collected by the end of the task).
I guess if your doing it via scopes, then in the case of this code:
for i = 0, N do
c.legion_begin_trace(...)
__demand(__index_launch)
for j = 0, M do
...
end
c.legion_end_trace(...)
end
why isn't the "end of the scope" after the last statement, e.g. the end trace call? So the deletion is done outside of the trace.
The thing to remember is that these are just demonstrative cases and actual code can get arbitrarily more complicated. For example, here's one variation that I had forgotten about in my original comment:
var M : int
while ... do
c.legion_begin_trace(...)
M = 0
for i = 0, N do
M += 1 -- or any other arbitrary computation
__demand(__index_launch)
for j = 0, M do
...
end
end
c.legion_end_trace(...)
end
M
is now a loop-carried dependency. (Or we can model this as any other arbitrary computation.) This means there is no opportunity to do LICM; we are simply stuck creating it inline.
Additionally there can be any level of nesting between the traced loop and our index launch. Or even function calls. Not every Regent application has the entirety of its main loop exposed in this way, there can be additional levels of indirection through __demand(__local)
task calls.
Why is it not possible to defer the index space deletion to the end of the trace? That also seems like a sound implementation. If you support index space creation it seems like the inverse operation should also be supported.
LICM is probably want we really want here for optimization purposes, but absent that, this feels like the right semantics.
If you're worried about the performance of back-to-back trace replays, we can actually safely add deletion operations to the list of special operations we allow between traces without invalidating preconditions since by definition deletions can't mess with preconditions for a trace since that would be use-after-free. I can put that optimization in so that even if you have those implicit deletions between traces you'll still get the fast back-to-back trace replays.
Why is it not possible to defer the index space deletion to the end of the trace? That also seems like a sound implementation. If you support index space creation it seems like the inverse operation should also be supported.
We don't support index space creation inside of traces either. It happens to work in the case of immediate index space creations because they don't produce operations that are captured by the trace. If you tried to do a deferred index space creation (e.g. from a future) that will result in an error.
As for why you can't delete an index space inside a trace. The region requirements for that deletion operation the next replay of the trace would have to be different which violates the requirements of tracing.
So the only way I see to implement this at the moment is to modify the C API to buffer index space deletions and apply them all after the legion_runtime_end_trace
call. Everything else is insufficient because of (a) local task launches, (b) across compilation units.
The problem with doing it at the C API layer is that if the user uses a C++ library that does tracing, I can't capture that. But I suppose this is enough of a corner case that maybe it's fine.
So the only way I see to implement this at the moment is to modify the C API to buffer index space deletions and apply them all after the legion_runtime_end_trace call. Everything else is insufficient because of (a) local task launches, (b) across compilation units.
Can you explain why that is? Why can't Regent just look at the syntax of the body of the loop and say I'm going to put the deletion statements after all the statements in the scope? It shouldn't even need to care whether the statements in the scope are Regent statements or some black-box external C statements.
Fundamentally, tracing uses dynamic scope, while loop bodies are lexical scope. That means you can end up in situations like:
__demand(__local)
task rk_stage(N : int)
__demand(__index_launch)
for i = 0, N do -- implicit index space created here
...
end
...
end
And then use it later like:
for t = 0, T do
c.legion_runtime_begin_trace()
for rk = 0, 4 do
rk_stage(N)
end
... -- other stuff
c.legion_runtime_end_trace()
end
In theory these tasks could even be in different compilation units and thus not visible to the compiler at the same time. And again, this is just a demonstrative example, and real codes can apply arbitrary additional levels of loops, scopes, task calls (as long as they are local), etc.
There is in general no meaningful way for Regent to lift the implicit index spaces out far enough such that they would be outside of the enclosing trace calls.
Yet another reason for automatic tracing.
I guess the other thing I'm confused about is why you need to make an implicit index space. We still support "immediate" domains for index space launches: https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/legion.h?ref_type=heads#L1675 You shouldn't even need to make an implicit index space in these cases.
The only reason to make an implicit index space is if you're doing LICM and you're going to reuse the index space over and over.
The sharding space for single task launches must always be a full index space:
https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/legion.h?ref_type=heads#L1574
I need to look at the index task launch case further; I do not immediately see where in the compiler we're actually making index spaces. The code that I do see appears to make only domains, as you suggest. (And the sharding space in that case is unset, so it inherits from the launch domain.)
Ok, I see it now. Regent was lifting +=
into a task and issuing a single task launch for that task, which it then subsequently decided to assign a sharding space. The sharding space goes through the above API which forces a conversion into a full index space.
because I simply don't delete index partitions 🤣. So that will need to be dealt with at some point, but I think not yet. If we recursively delete from the root index space then it will not be an issue for leaks at program termination, though it could potentially contribute to memory usage growth during application execution.
Right, this will only apply to use cases where you are making lots of index partitions over the lifetime of the application. If you're making just a few and reusing them then it won't be an issue, and when you delete the root index space Legion will just clean everything up.
Re: my point about LICM here https://github.com/StanfordLegion/legion/issues/606#issuecomment-2197683702. Unfortunately this is not sufficient. S3D has deeper loop nesting which (unsurprisingly) causes my simple hack to fail. So now we're back to these options:
__demand(__index_launch)
and didn't realize it wasn't being applied. Therefore we could reasonably shut off sharding in that case because it likely wouldn't have helped anyway, and isn't likely to be catastrophic to performance. However, we are still making assumptions about the structure of user programs (e.g., that users do not mix index launches and single task launches in the same outer loop where the user would expect the single task launches to get sharded).
Regent tasks are currently leaking index and field spaces when they exit. The compiler should emit code to clean them up.