Closed mariodirenzo closed 1 year ago
That's not a bug. A delta of 0
means "don't care". In order to say you don't want any padding then you need to give a negative delta. I even wrote that down here:
https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/legion/legion_constraint.h?ref_type=heads#L704-707
Also note that that commit didn't change those semantics, just better abided by them.
I've evidence that your version of the function allows the runtime to map region requirements with padding constraints on instances without padding. Note that such mappings were not allowed before 1626f5638181c210bc1b74c96838b05e5339ba6d
and should not be allowed
map region requirements with padding constraints
Non-zero padding constraints? Also, you're sure the safe mapper checks are on? They are off by default in release mode. Without them the runtime will let you do all sorts of illegal things.
It's worth noting that the old code was overly strict compared to the documented semantics. If you were relying on 0
bounds before to mean that there was no padding it would have worked but the implementation was inconsistent with the documentation. Your proposed fix is also overly strict and doesn't handle the cases with zero or negative values correctly.
If you want we can try switching the semantics so that 0
means absolutely no padding and any negative integer means "don't care".
The semantics 0
== "I do not care" can be applicable only when you are talking about region requirements. When you are dealing with the LayoutConstraints
of an InstanceManager
, 0
means that the underlying instance does not have any padding space.
For instance, you are checking if the LayoutConstraints
of a given InstanceManager
are compatible with the constraints of the selected task variant at https://gitlab.com/StanfordLegion/legion/-/blob/control_replication/runtime/legion/mapper_manager.cc?ref_type=heads#L1848 . The current version of the code will not recognize a conflict between an instance that has zero padding and a region requirement with a positive padding constraint.
A similar check is in place in many functions of the runtime.
Note 1: your new function would work as expected if you start checking if the layout constraints of the selected task variant conflict with the layout constraints of the selected instance
Note 2: the semantics 0
== "I do not care" is applied by my patch and by your old implementation as well. In fact, the constraints of the task variants are contained in the other.delta
in the function. If other.delta
is zero along one direction, you will never enter in any of the if statements
+ if ((delta.lo()[idx]*other.delta.lo()[idx]) < 0)
+ return true;
+ else if (delta.lo()[idx] < other.delta.lo()[idx])
+ return true;
+
+ if ((delta.hi()[idx]*other.delta.hi()[idx]) < 0)
+ return true;
+ else if (delta.hi()[idx] < other.delta.hi()[idx])
+ return true;
regardless of the value of delta
, which rapresentes of the size of the padding space of the instance.
The semantics 0 == "I do not care" can be applicable only when you are talking about region requirements.
You missed this part: https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/legion/legion_instances.cc?ref_type=heads#L3909-3918
There are no instances with 0
constraints.
Do you want a reproducer on sapling?
BTW, your comment is incorrect. If an instance is created with padding constraints only in one direction, it does not go through that path. For instance, using this padding constraint
const PaddingConstraint padding_1D_X(Point<3>{3, 0, 0}, Point<3>{3, 0, 0});
the runtime will most likely try to create an instance with 3 points of padding on the x
sides and 0 points on the other.
If afterwards you launch a task with
const PaddingConstraint padding_1D_Y(Point<3>{0, 3, 0}, Point<3>{0, 3, 0});
the runtime will not recognize that the instance created for the previous task is not compatible with this layout constraint.
If you want to keep your version of the conflict check, you will need to apply the following patch
diff --git a/runtime/legion/legion_instances.cc b/runtime/legion/legion_instances.cc
index 1f6e0e784..eba4ad0a1 100644
--- a/runtime/legion/legion_instances.cc
+++ b/runtime/legion/legion_instances.cc
@@ -3924,6 +3924,14 @@ namespace Legion {
for (unsigned dim = 0; dim < num_dims; dim++)
empty[dim] = -1; // no padding
constraints.padding_constraint.delta = Domain(empty, empty);
+ } else {
+ DomainPoint lo = constraints.padding_constraint.delta.lo();
+ DomainPoint hi = constraints.padding_constraint.delta.hi();
+ for (unsigned dim = 0; dim < num_dims; dim++) {
+ if (lo[dim] == 0) lo[dim] = -1;
+ if (hi[dim] == 0) hi[dim] = -1;
+ }
+ constraints.padding_constraint.delta = Domain(lo, hi);
}
// Now let's find the layout constraints to use for this instance
LayoutDescription *layout = field_space_node->find_layout_description(
Note that also the following fix is required for instances created with delta equal to -1
in some directions
diff --git a/runtime/legion/runtime.cc b/runtime/legion/runtime.cc
index 4bb49bc11..99ec6cb61 100644
--- a/runtime/legion/runtime.cc
+++ b/runtime/legion/runtime.cc
@@ -5323,8 +5323,15 @@ namespace Legion {
#ifdef DEBUG_LEGION
assert(domain.get_dim() == delta.get_dim());
#endif
+ DomainPoint lo = delta.lo();
+ DomainPoint hi = delta.hi();
+ for (int dim = 0; dim < delta.get_dim(); dim++) {
+ lo[dim] = std::max(coord_t(0), lo[dim]);
+ hi[dim] = std::max(coord_t(0), hi[dim]);
+ }
+
const Domain padded_bounds =
- Domain(domain.lo() - delta.lo(), domain.hi() + delta.hi());
+ Domain(domain.lo() - lo, domain.hi() + hi);
switch (domain.get_dim())
{
#define DIMFUNC(DIM) \
@@ -5436,7 +5443,13 @@ namespace Legion {
#ifdef DEBUG_LEGION
assert(bounds.get_dim() == delta.get_dim());
#endif
- outer = Domain(bounds.lo() - delta.lo(), bounds.hi() + delta.hi());
+ DomainPoint lo = delta.lo();
+ DomainPoint hi = delta.hi();
+ for (int dim = 0; dim < delta.get_dim(); dim++) {
+ lo[dim] = std::max(coord_t(0), lo[dim]);
+ hi[dim] = std::max(coord_t(0), hi[dim]);
+ }
+ outer = Domain(bounds.lo() - lo, bounds.hi() + hi);
}
else
outer = bounds;
BTW, your comment is incorrect. If an instance is created with padding constraints only in one direction, it does not go through that path.
That's undefined behavior. You should never be trying to create an instance with "don't-care" padding. I'm just going to reverse the polarity of "don't care" and "no constraints" so that nobody else makes the same mistake that you did.
Pull and try again.
You still need to apply the patch above to runtime/legion/runtime.cc
to make sure that the outer bounds of the instance are properly computed when delta is -1
.
Look again, I already did that.
Although I still think that should be undefined behavior/an error.
the execution of the unit test still fails with
(gdb) bt
#0 0x00007f0411a8323f in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x55c1159a4920, rem=rem@entry=0x55c1159a4920) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
#1 0x00007f0411a88ec7 in __GI___nanosleep (requested_time=requested_time@entry=0x55c1159a4920, remaining=remaining@entry=0x55c1159a4920) at nanosleep.c:27
#2 0x00007f0411a88dfe in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#3 0x00007f0415080041 in Realm::realm_freeze (signal=6) at /home/mariodr/legion3/runtime/realm/runtime_impl.cc:200
#4 <signal handler called>
#5 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#6 0x00007f04119c8859 in __GI_abort () at abort.c:79
#7 0x00007f04119c8729 in __assert_fail_base (fmt=0x7f0411b5e588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f0415d08f87 "lo[i] >= 0", file=0x7f0415cfeaf0 "/home/mariodr/legion3/runtime/legion/region_tree.inl", line=505, function=<optimized out>)
at assert.c:92
#8 0x00007f04119d9fd6 in __GI___assert_fail (assertion=0x7f0415d08f87 "lo[i] >= 0", file=0x7f0415cfeaf0 "/home/mariodr/legion3/runtime/legion/region_tree.inl", line=505,
function=0x7f0415d0bca0 "Realm::InstanceLayoutGeneric* Legion::Internal::IndexSpaceExpression::create_layout_internal(const Realm::IndexSpace<N, T>&, const Legion::LayoutConstraintSet&, const std::vector<unsigned int>&, const"...) at assert.c:101
#9 0x00007f0414c6dd5d in Legion::Internal::IndexSpaceExpression::create_layout_internal<3, long long> (this=0x7f03f2869860, space=..., constraints=..., field_ids=std::vector of length 2, capacity 2 = {...}, field_sizes=std::vector of length 2, capacity 2 = {...},
compact=false, piece_list=0x7f03f2f2d7f8, piece_list_size=0x7f03f2f2d800, num_pieces=0x7f03f2f2c2f0) at /home/mariodr/legion3/runtime/legion/region_tree.inl:505
#10 0x00007f0414c4acf2 in Legion::Internal::IndexSpaceNodeT<3, long long>::create_layout (this=0x7f03f2869530, constraints=..., field_ids=std::vector of length 2, capacity 2 = {...}, field_sizes=std::vector of length 2, capacity 2 = {...}, compact=false,
piece_list=0x7f03f2f2d7f8, piece_list_size=0x7f03f2f2d800, num_pieces=0x7f03f2f2c2f0) at /home/mariodr/legion3/runtime/legion/region_tree.inl:4905
#11 0x00007f04145a00d6 in Legion::Internal::InstanceBuilder::create_physical_instance (this=0x7f03f2f2d5e0, forest=0x55c1123ca840, unsat_kind=0x0, unsat_index=0x0, footprint=0x7f03f2f2d408, precondition=...)
at /home/mariodr/legion3/runtime/legion/legion_instances.cc:3814
#12 0x00007f04148d7c27 in Legion::Internal::MemoryManager::allocate_physical_instance (this=0x7f03f27109e0, builder=..., footprint=0x7f03f2f2ef90, unsat_kind=0x0, unsat_index=0x0) at /home/mariodr/legion3/runtime/legion/runtime.cc:10366
#13 0x00007f04148cfd5c in Legion::Internal::MemoryManager::find_or_create_physical_instance (this=0x7f03f27109e0, constraints=..., regions=std::vector of length 1, capacity 1 = {...}, result=..., created=@0x7f03f2f2eb29: false, processor=..., acquire=true, priority=0,
tight_region_bounds=false, unsat_kind=0x0, unsat_index=0x0, footprint=0x7f03f2f2ef90, creator_id=398, remote=false) at /home/mariodr/legion3/runtime/legion/runtime.cc:8728
#14 0x00007f0414913653 in Legion::Internal::Runtime::find_or_create_physical_instance (this=0x55c11344d2e0, target_memory=..., constraints=..., regions=std::vector of length 1, capacity 1 = {...}, result=..., created=@0x7f03f2f2eb29: false, processor=..., acquire=true,
priority=0, tight_bounds=false, unsat=0x0, footprint=0x7f03f2f2ef90, creator_id=398) at /home/mariodr/legion3/runtime/legion/runtime.cc:26056
#15 0x00007f0414a98446 in Legion::Internal::MapperManager::find_or_create_physical_instance (this=0x55c111d9e360, ctx=0x7f03f29ee4d0, target_memory=..., constraints=..., regions=std::vector of length 1, capacity 1 = {...}, result=..., created=@0x7f03f2f2eb29: false,
acquire=true, priority=0, tight_region_bounds=false, footprint=0x7f03f2f2ef90, unsat=0x0) at /home/mariodr/legion3/runtime/legion/mapper_manager.cc:2043
#16 0x00007f041473c661 in Legion::Mapping::MapperRuntime::find_or_create_physical_instance (this=0x55c112382540, ctx=0x7f03f29ee4d0, target_memory=..., constraints=..., regions=std::vector of length 1, capacity 1 = {...}, result=..., created=@0x7f03f2f2eb29: false,
acquire=true, priority=0, tight_bounds=false, footprint=0x7f03f2f2ef90, unsat=0x0) at /home/mariodr/legion3/runtime/legion/legion_mapping.cc:875
#17 0x00007f0414fe5940 in Legion::Mapping::DefaultMapper::default_make_instance (this=0x55c111d9df20, ctx=0x7f03f29ee4d0, target_memory=..., constraints=..., result=..., kind=Legion::Mapping::DefaultMapper::TASK_MAPPING, force_new=false, meets=true, req=...,
footprint=0x7f03f2f2ef90) at /home/mariodr/legion3/runtime/mappers/default_mapper.cc:2291
#18 0x00007f0414fe3997 in Legion::Mapping::DefaultMapper::default_create_custom_instances (this=0x55c111d9df20, ctx=0x7f03f29ee4d0, target_proc=..., target_memory=..., req=..., index=1, needed_fields=std::set with 0 elements, layout_constraints=...,
needs_field_constraint_check=false, instances=std::vector of length 1, capacity 1 = {...}, footprint=0x7f03f2f2ef90) at /home/mariodr/legion3/runtime/mappers/default_mapper.cc:1933
#19 0x00007f0414fe1a97 in Legion::Mapping::DefaultMapper::map_task (this=0x55c111d9df20, ctx=0x7f03f29ee4d0, task=..., input=..., output=...) at /home/mariodr/legion3/runtime/mappers/default_mapper.cc:1462
#20 0x00007f0414a91ad0 in Legion::Internal::MapperManager::invoke_map_task (this=0x55c111d9e360, task=0x7f03f2a222d0, input=0x7f03f2f2f110, output=0x7f03f2f2f160, info=0x7f03f29ee4d0) at /home/mariodr/legion3/runtime/legion/mapper_manager.cc:176
#21 0x00007f041430f593 in Legion::Internal::SingleTask::invoke_mapper (this=0x7f03f2a222d0, must_epoch_owner=0x0) at /home/mariodr/legion3/runtime/legion/legion_tasks.cc:3590
#22 0x00007f0414311beb in Legion::Internal::SingleTask::map_all_regions (this=0x7f03f2a222d0, must_epoch_op=0x0, defer_args=0x7f03f2a09560) at /home/mariodr/legion3/runtime/legion/legion_tasks.cc:3984
#23 0x00007f041431fdf9 in Legion::Internal::PointTask::perform_mapping (this=0x7f03f2a222d0, must_epoch_owner=0x0, args=0x7f03f2a09560) at /home/mariodr/legion3/runtime/legion/legion_tasks.cc:7181
#24 0x00007f041492c157 in Legion::Internal::Runtime::legion_runtime_task (args=0x7f03f2a09560, arglen=52, userdata=0x55c11595eb40, userlen=8, p=...) at /home/mariodr/legion3/runtime/legion/runtime.cc:32257
#25 0x00007f041540036e in Realm::LocalTaskProcessor::execute_task (this=0x55c112381ae0, func_id=4, task_args=...) at /home/mariodr/legion3/runtime/realm/proc_impl.cc:1175
#26 0x00007f0415205d88 in Realm::Task::execute_on_processor (this=0x7f03f2a093e0, p=...) at /home/mariodr/legion3/runtime/realm/tasks.cc:326
#27 0x00007f041520b114 in Realm::UserThreadTaskScheduler::execute_task (this=0x55c112381ed0, task=0x7f03f2a093e0) at /home/mariodr/legion3/runtime/realm/tasks.cc:1687
#28 0x00007f0415208ebc in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x55c112381ed0) at /home/mariodr/legion3/runtime/realm/tasks.cc:1160
#29 0x00007f0415211ef8 in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x55c112381ed0) at /home/mariodr/legion3/runtime/realm/threads.inl:97
#30 0x00007f04151df542 in Realm::UserThread::uthread_entry () at /home/mariodr/legion3/runtime/realm/threads.cc:1355
#31 0x00007f0411a014e0 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib/x86_64-linux-gnu/libc.so.6
#32 0x0000000000000000 in ?? ()
(gdb) f 9
#9 0x00007f0414c6dd5d in Legion::Internal::IndexSpaceExpression::create_layout_internal<3, long long> (this=0x7f03f2869860, space=..., constraints=..., field_ids=std::vector of length 2, capacity 2 = {...}, field_sizes=std::vector of length 2, capacity 2 = {...},
compact=false, piece_list=0x7f03f2f2d7f8, piece_list_size=0x7f03f2f2d800, num_pieces=0x7f03f2f2c2f0) at /home/mariodr/legion3/runtime/legion/region_tree.inl:505
505 assert(lo[i] >= 0);
(gdb) p delta
$1 = (const Legion::Domain &) @0x7f03f2f2d670: {static MAX_RECT_DIM = 3, static NO_DOMAIN = {static MAX_RECT_DIM = 3, static NO_DOMAIN = <same as static member of an already seen type>, is_id = 0, is_type = 0, dim = 0, rect_data = {0, 0, 0, 0, 0, 0}}, is_id = 0,
is_type = 0, dim = 3, rect_data = {3, -1, -1, 3, -1, -1}}
I pushed a fix to the master
branch, but need @elliottslaughter to do the merge into control replication to resolve a CMake conflict.
The fix was merged into the control replication branch.
There is still a problem. An unset padding constraint conflucts with a zero padding constraint as you can see in the following case:
(gdb) p $11.padding_constraint
$13 = {<Legion::LayoutConstraint> = {_vptr.LayoutConstraint = 0x5651f277b670 <vtable for Legion::PaddingConstraint+16>}, static constraint_kind = LEGION_PADDING_CONSTRAINT, delta = {static MAX_RECT_DIM = 3, static NO_DOMAIN = {static MAX_RECT_DIM = 3,
static NO_DOMAIN = <same as static member of an already seen type>, is_id = 0, is_type = 0, dim = 0, rect_data = {0, 0, 0, 0, 0, 0}}, is_id = 0, is_type = 0, dim = 0, rect_data = {0, 0, 0, 0, 0, 0}}}
(gdb) p $12.padding_constraint
$14 = {<Legion::LayoutConstraint> = {_vptr.LayoutConstraint = 0x5651f277b670 <vtable for Legion::PaddingConstraint+16>}, static constraint_kind = LEGION_PADDING_CONSTRAINT, delta = {static MAX_RECT_DIM = 3, static NO_DOMAIN = {static MAX_RECT_DIM = 3,
static NO_DOMAIN = <same as static member of an already seen type>, is_id = 0, is_type = 0, dim = 0, rect_data = {0, 0, 0, 0, 0, 0}}, is_id = 0, is_type = 0, dim = 3, rect_data = {0, 0, 0, 0, 0, 0}}}
(gdb) p $11.padding_constraint.conflicts($12.padding_constraint)
$15 = true
This issue prevents a region requirement without padding constraints from being mapped on a region mapped with zero padding using the current logic available in the default mapper (https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/mappers/default_mapper.cc#L1976)
This is fixed by the following patch
diff --git a/runtime/legion/legion_constraint.cc b/runtime/legion/legion_constraint.cc
index e0a757f0f..8f327b296 100644
--- a/runtime/legion/legion_constraint.cc
+++ b/runtime/legion/legion_constraint.cc
@@ -1763,8 +1763,16 @@ namespace Legion {
{
if (other.delta.get_dim() > 0)
{
- if (delta.get_dim() != other.delta.get_dim())
- return true;
+ if (delta.get_dim() != other.delta.get_dim()) {
+ if (delta.get_dim() == 0) {
+ for (int idx = 0; idx < other.delta.get_dim(); idx++) {
+ if (other.delta.lo()[idx] > 0) return true;
+ if (other.delta.hi()[idx] > 0) return true;
+ }
+ return false;
+ } else
+ return true;
+ }
for (int idx = 0; idx < delta.get_dim(); idx++)
{
if ((delta.lo()[idx] >= 0) && (other.delta.lo()[idx] >= 0))
I pushed a less restrictive change that will at least handle the case. Conflict testing should only fail if both constraint sets strictly conflict with each other. If either one is "don't care" then they shouldn't ever conflict with each other. For example, if either constraint has a zero dimension, that is considered a "don't care" and therefore will never be able to conflict with another set of constraints.
Sounds good. Thanks for fixing the issue
Thanks for dealing with the churn. It was necessary to make the implementation more robust.
One of the HTR unit tests has shown that
1626f5638181c210bc1b74c96838b05e5339ba6d
has introduced a bug in the detection of the conflicts between padded constraints. In particular, if a constraint with0
delta
is tested against a constraint with positivedelta
, the conflict is not detected.The following patch fixes the issue
@elliottslaughter, could you please add this issue to #1032 with high priority?