StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
675 stars 145 forks source link

Assertion `consumer_depth >= producer_depth' failed. #1585

Open jiazhihao opened 10 months ago

jiazhihao commented 10 months ago

I ran into the following Legion assertion failure. I think this might be an application-side bug, but would be great if I can get some assistance to help me understand this assertion.

Legion::Internal::FutureImpl::register_dependence(Legion::Internal::Operation*): Assertion `consumer_depth >= producer_depth' failed.

Backtrace:

(gdb) bt
#0  0x00007fffccabda9f in raise () from /lib64/libc.so.6
#1  0x00007fffcca90e05 in abort () from /lib64/libc.so.6
#2  0x00007fffcca90cd9 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007fffccab63f6 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fffd2ff6109 in Legion::Internal::FutureImpl::register_dependence (this=0x7ff5d889ab50, consumer_op=0x7ff5e0225df0)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/legion/runtime.cc:2404
#5  0x00007fffd2deee16 in Legion::Internal::IndexTask::perform_base_dependence_analysis (this=0x7ff5e0225c10)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/legion/legion_tasks.cc:9197
#6  0x00007fffd2dee315 in Legion::Internal::IndexTask::trigger_dependence_analysis (this=0x7ff5e0225c10)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/legion/legion_tasks.cc:9037
#7  0x00007fffd319bec4 in Legion::Internal::Predicated<Legion::Internal::IndexTask>::trigger_dependence_analysis (this=0x7ff5e0225c10)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/legion/legion_ops.inl:170
#8  0x00007fffd2bfd4ac in Legion::Internal::Operation::execute_dependence_analysis (this=0x7ff5e0225df0)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/legion/legion_ops.cc:1643
#9  0x00007fffd2ac7b9e in Legion::Internal::InnerContext::process_dependence_stage (this=0x7ff5e42015a0)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/legion/legion_context.cc:7936
#10 0x00007fffd2ad6b58 in Legion::Internal::InnerContext::handle_dependence_stage (args=0x7ff5d88a42f0)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/legion/legion_context.cc:11725
#11 0x00007fffd3069839 in Legion::Internal::Runtime::legion_runtime_task (args=0x7ff5d88a42f0, arglen=12, userdata=0x1b6cc50, userlen=8, p=...)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/legion/runtime.cc:32068
#12 0x00007fffce8fdf1e in Realm::LocalTaskProcessor::execute_task (this=0x1ab8420, func_id=4, task_args=...)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/realm/proc_impl.cc:1175
#13 0x00007fffce9729f5 in Realm::Task::execute_on_processor (this=0x7ff5d88a4170, p=...) at /home/zhihaoj2/FlexFlow/deps/legion/runtime/realm/tasks.cc:326
#14 0x00007fffce977846 in Realm::UserThreadTaskScheduler::execute_task (this=0x1ab87c0, task=0x7ff5d88a4170)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/realm/tasks.cc:1687
#15 0x00007fffce97584a in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x1ab87c0) at /home/zhihaoj2/FlexFlow/deps/legion/runtime/realm/tasks.cc:1160
#16 0x00007fffce97cf7e in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x1ab87c0)
    at /home/zhihaoj2/FlexFlow/deps/legion/runtime/realm/threads.inl:97
#17 0x00007fffce98a631 in Realm::UserThread::uthread_entry () at /home/zhihaoj2/FlexFlow/deps/legion/runtime/realm/threads.cc:1355
#18 0x00007fffcca930b0 in ?? () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()
jiazhihao commented 10 months ago

More context:

lightsighter commented 10 months ago

This is definitely an application bug (that deserves a better error message). Effectively this is saying that you're trying to use a Future that was produced in a child task in a parent task further up the task tree, which is illegal. You can't store Legion futures in the heap and then pass them out-of-band up the task tree. Future's lifetimes have to end at the task that produced them.

In frame 5 (on the consumer side), IndexTask's get_depth() returns 1 while IndexTask's get_context()->get_depth() returns 0. The IndexTask was launched in a child task of top_level_task, so I would expect its depth to be 1.

Right so this future was made in the "child task of top_level_task", but it is being used in the top_level_task, which is illegal. You need to pass things like this by-value back up the task tree.

jiazhihao commented 10 months ago

@lightsighter In my use case, I created a background task T_B that behaves like a worker, which iteratively pull requests from a work queue and launch Legion task. T_B is a child task of the top_level_task. T_B launches two tasks T_1 and T_2 sequentially. T_1 is a normal task launch and T_2 is an indexed task launch. T_1's result is used as the future by T_2. Is this use case allow by legion?

jiazhihao commented 10 months ago

BTW, is it expected that

IndexTask's get_depth() returns 1 while IndexTask's get_context()->get_depth() returns 0 ?

jiazhihao commented 10 months ago

@lightsighter I was able to identify the issue. Am I correct that we can only call Runtime::get_context() in the top_level task? It seems to give me incorrect Context if I am calling it in a child task.

lightsighter commented 10 months ago

T_B launches two tasks T_1 and T_2 sequentially. T_1 is a normal task launch and T_2 is an indexed task launch. T_1's result is used as the future by T_2. Is this use case allow by legion?

That should be allowed, but I don't think that is where your error was coming from. The error reported above would have occurred if the top-level task tried to launch a sub-task using the future produced by T_1 from T_B's context. The consumer_depth = 0 means that you were definitely using the future in the top-level task. The producer_depth = 1 means that the future was produced in a child task of the top-level task (e.g. like T_B).

BTW, is it expected that: IndexTask's get_depth() returns 1 while IndexTask's get_context()->get_depth() returns 0

Yes, because you asked for the depth of the context of the task. By definition the depth of the context of a task is always one less than the depth of the task itself.

Am I correct that we can only call Runtime::get_context() in the top_level task? It seems to give me incorrect Context if I am calling it in a child task.

You can call Runtime::get_context anywhere inside of a Legion task. I sincerely doubt it is giving you the wrong context. It will always return to you the same Context argument that was passed as an argument to start the task. If you think you have a bug with Runtime::get_context then make a small reproducer so we can look at it, but I would be very surprised fi that code is wrong as lots of people rely on it now.

lightsighter commented 10 months ago

I pushed a better error message for this failure mode if you want to try again and see what it says.