StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
657 stars 146 forks source link

Fuzzer: assert in record_subscriptions #1667

Closed elliottslaughter closed 3 months ago

elliottslaughter commented 3 months ago

Here's a new issue that I've seen when all previous patches have been merged.

Assertion failed: ((subscriptions.find(it->first) == subscriptions.end()) || (subscriptions.find(it->first)->second * it->second)), function record_subscriptions, file legion_analysis.cc, line 21676.

Fuzzer version: https://github.com/StanfordLegion/fuzzer/commit/1d21e5b5ecfec820ae1a35e91f32591abb055cb6

Command line:

./fuzzer -fuzz:seed 534 -fuzz:ops 10 -fuzz:skip 1 -level 4

The issue is non-deterministic and reproduces maybe 50% of the time.

Because it's non-deterministic, it can be helpful to run it in a loop:

i=0; while build/src/fuzzer -fuzz:seed 534 -fuzz:ops 10 -fuzz:skip 1 -level 4; do let i++; echo $i; done

Backtrace:

  * frame #4: 0x0000000102934273 liblegion.1.dylib`Legion::Internal::EqSetTracker::record_subscriptions(unsigned int, Legion::Internal::FieldMaskSet<Legion::Internal::EqKDTree, (Legion::Internal::AllocationType)106, false> const&) (.cold.2) at legion_analysis.cc:21675:11 [opt]
    frame #5: 0x0000000101e62989 liblegion.1.dylib`Legion::Internal::EqSetTracker::record_subscriptions(this=0x00007fa36af3f210, source=0, new_subscriptions=0x0000600000ff7690) at legion_analysis.cc:21675:11 [opt]
    frame #6: 0x0000000101e61e6d liblegion.1.dylib`Legion::Internal::EqSetTracker::record_equivalence_sets(this=0x00007fa36af3f210, context=0x00007fa36b008e00, mask=0x00007000092775a0, eq_sets=0x00007000092772b0, to_create=0x0000700009277390, creation_rects=size=1, creation_srcs=size=0, new_subscriptions=0x0000700009277320, new_references=25, source=<unavailable>, total_responses=1, ready_events=size=3, target_mapping=0x00007000092771e0, targets=size=1, creation_target_space=0) at legion_analysis.cc:21574:11 [opt]
    frame #7: 0x0000000101ee0317 liblegion.1.dylib`Legion::Internal::InnerContext::report_equivalence_sets(this=0x00007fa36b008e00, target_mapping=0x00007000092771e0, targets=<unavailable>, creation_target_space=0, mask=0x00007000092775a0, new_target_references=size=1, eq_sets=0x00007000092772b0, new_subscriptions=0x0000700009277320, to_create=0x0000700009277390, creation_rects=size=1, creation_srcs=size=0, expected_responses=<unavailable>, ready_events=size=3) at legion_context.cc:2366:32 [opt]
    frame #8: 0x0000000101eddd7a liblegion.1.dylib`Legion::Internal::InnerContext::compute_equivalence_sets(this=0x00007fa36b008e00, req_index=<unavailable>, targets=size=1, target_spaces=size=1, creation_target_space=0, expr=0x00007fa36b020400, mask=0x00007000092775a0) at legion_context.cc:2233:14 [opt]
    frame #9: 0x0000000101e7c3f6 liblegion.1.dylib`Legion::Internal::VersionManager::perform_versioning_analysis(this=0x00007fa36af3f210, context=0x00007fa36b008e00, version_info=0x00006000010f8e00, region_node=0x00007fa36b0b2400, version_mask=<unavailable>, op=0x00007fa36b02ca00, index=0, parent_req_index=0, ready_events=size=1, output_region_ready=0x0000000000000000, collective_rendezvous=<unavailable>) at legion_analysis.cc:23832:28 [opt]
    frame #10: 0x000000010227154c liblegion.1.dylib`Legion::Internal::RegionTreeForest::perform_versioning_analysis(Legion::Internal::Operation*, unsigned int, Legion::RegionRequirement const&, Legion::Internal::VersionInfo&, std::__1::set<Legion::Internal::RtEvent, std::__1::less<Legion::Internal::RtEvent>, std::__1::allocator<Legion::Internal::RtEvent>>&, Legion::Internal::RtEvent*, bool) [inlined] Legion::Internal::RegionNode::perform_versioning_analysis(this=0x00007fa36b0b2400, ctx=1, parent_ctx=0x00007fa36b008e00, version_info=0x00006000010f8e00, mask=0x0000700009277690, op=0x00007fa36b02ca00, index=0, parent_req_index=0, applied=size=1, output_region_ready=<unavailable>, collective_rendezvous=<unavailable>) at region_tree.cc:17354:15 [opt]
    frame #11: 0x00000001022714f0 liblegion.1.dylib`Legion::Internal::RegionTreeForest::perform_versioning_analysis(this=<unavailable>, op=0x00007fa36b02ca00, index=0, req=<unavailable>, version_info=0x00006000010f8e00, ready_events=size=1, output_region_ready=0x0000000000000000, collective_rendezvous=<unavailable>) at region_tree.cc:1703:20 [opt]
    frame #12: 0x000000010212dc2f liblegion.1.dylib`Legion::Internal::SingleTask::perform_versioning_analysis(this=0x00007fa36b02ca00, post_mapper=false) at legion_tasks.cc:2683:28 [opt]
    frame #13: 0x0000000102136716 liblegion.1.dylib`Legion::Internal::SingleTask::map_all_regions(this=0x00007fa36b02ca00, must_epoch_op=0x0000000000000000, defer_args=0x0000000000000000) at legion_tasks.cc:4158:15 [opt]
    frame #14: 0x00000001021424c3 liblegion.1.dylib`Legion::Internal::IndividualTask::perform_mapping(this=<unavailable>, must_epoch_owner=<unavailable>, args=<unavailable>) at legion_tasks.cc:6279:32 [opt]
    frame #15: 0x000000010212d3de liblegion.1.dylib`Legion::Internal::SingleTask::trigger_mapping(this=0x00007fa36b02ca00) at legion_tasks.cc:0 [opt]
    frame #16: 0x00000001023809e2 liblegion.1.dylib`Legion::Internal::Runtime::legion_runtime_task(args=0x00007fa31af07c58, arglen=<unavailable>, userdata=<unavailable>, userlen=<unavailable>, p=<unavailable>) at runtime.cc:32353:31 [opt]
    frame #17: 0x00000001036e12f9 librealm.1.dylib`Realm::LocalTaskProcessor::execute_task(this=<unavailable>, func_id=4, task_args=0x0000700009277ca8) at proc_impl.cc:1176:5 [opt]
    frame #18: 0x000000010371ec80 librealm.1.dylib`Realm::Task::execute_on_processor(this=0x00007fa31af07b20, p=(id = 2089670227099910144)) at tasks.cc:326:40 [opt]
    frame #19: 0x0000000103724f16 librealm.1.dylib`Realm::KernelThreadTaskScheduler::execute_task(this=<unavailable>, task=<unavailable>) at tasks.cc:1421:11 [opt]
    frame #20: 0x0000000103723563 librealm.1.dylib`Realm::ThreadedTaskScheduler::scheduler_loop(this=0x00007fa36b8046c0) at tasks.cc:1160:6 [opt]
    frame #21: 0x0000000103726ffe librealm.1.dylib`void Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock()>(void*) [inlined] Realm::ThreadedTaskScheduler::scheduler_loop_wlock(this=0x00007fa36b8046c0) at tasks.cc:1272:5 [opt]
    frame #22: 0x0000000103726fea librealm.1.dylib`void Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock()>(obj=0x00007fa36b8046c0) at threads.inl:97:5 [opt]
lightsighter commented 3 months ago

Fixed with: https://gitlab.com/StanfordLegion/legion/-/commit/3fc0fc4115bd3666b0577b179c6707cf1b6d5693