StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
657 stars 146 forks source link

Fuzzer: assert in add_gc_reference #1665

Closed elliottslaughter closed 3 months ago

elliottslaughter commented 3 months ago

I have once again merged all your existing fixed together. I am now seeing:

Assertion failed: (is_global<false >()), function add_gc_reference, file garbage_collection.cc, line 130.

The failure is nondeterministic, maybe about 60-70% of the time on my Mac.

Fuzzer at this commit: https://github.com/StanfordLegion/fuzzer/commit/aea5acf7ab31cc7787500ca83dfc70bdd5ecf34a

Command line:

./fuzzer -fuzz:seed 367 -fuzz:ops 71 -fuzz:skip 3 -level 4

Backtrace:

  * frame #4: 0x0000000101d9f10b liblegion.1.dylib`Legion::Internal::DistributedCollectable::add_gc_reference(this=0x00007ff3f7877000, cnt=1) at garbage_collection.cc:130:7 [opt]
    frame #5: 0x0000000101e7123a liblegion.1.dylib`Legion::Internal::EqSetTracker::finalize_equivalence_sets(Legion::Internal::RtUserEvent, Legion::Internal::InnerContext*, Legion::Internal::Runtime*, unsigned int, Legion::Internal::IndexSpaceExpression*, unsigned long long) [inlined] Legion::Internal::DistributedCollectable::add_base_gc_ref(this=<unavailable>, source=VERSION_MANAGER_REF, cnt=1) at garbage_collection.h:570:7 [opt]
    frame #6: 0x0000000101e7122a liblegion.1.dylib`Legion::Internal::EqSetTracker::finalize_equivalence_sets(this=0x00007ff402729440, done_event=RtUserEvent @ 0x000070000746d258, context=0x00007ff403822200, runtime=0x00007ff403813200, parent_req_index=0, expr=0x00007ff3fa80d200, opid=410) at legion_analysis.cc:22804:28 [opt]
    frame #7: 0x0000000101e7c82b liblegion.1.dylib`Legion::Internal::VersionManager::perform_versioning_analysis(this=0x00007ff402729440, context=0x00007ff403822200, version_info=0x000060000067c200, region_node=0x00007ff402845000, version_mask=<unavailable>, op=0x00007ff3f7876800, index=0, parent_req_index=0, ready_events=size=1, output_region_ready=0x0000000000000000, collective_rendezvous=<unavailable>) at legion_analysis.cc:23849:11 [opt]
    frame #8: 0x00000001022718dc liblegion.1.dylib`Legion::Internal::RegionTreeForest::perform_versioning_analysis(Legion::Internal::Operation*, unsigned int, Legion::RegionRequirement const&, Legion::Internal::VersionInfo&, std::__1::set<Legion::Internal::RtEvent, std::__1::less<Legion::Internal::RtEvent>, std::__1::allocator<Legion::Internal::RtEvent>>&, Legion::Internal::RtEvent*, bool) [inlined] Legion::Internal::RegionNode::perform_versioning_analysis(this=0x00007ff402845000, ctx=1, parent_ctx=0x00007ff403822200, version_info=0x000060000067c200, mask=0x000070000746d630, op=0x00007ff3f7876800, index=0, parent_req_index=0, applied=size=1, output_region_ready=<unavailable>, collective_rendezvous=<unavailable>) at region_tree.cc:17354:15 [opt]
    frame #9: 0x0000000102271880 liblegion.1.dylib`Legion::Internal::RegionTreeForest::perform_versioning_analysis(this=<unavailable>, op=0x00007ff3f7876800, index=0, req=<unavailable>, version_info=0x000060000067c200, ready_events=size=1, output_region_ready=0x0000000000000000, collective_rendezvous=<unavailable>) at region_tree.cc:1703:20 [opt]
    frame #10: 0x000000010212dfbf liblegion.1.dylib`Legion::Internal::SingleTask::perform_versioning_analysis(this=0x00007ff3f7876800, post_mapper=false) at legion_tasks.cc:2683:28 [opt]
    frame #11: 0x0000000102136aa6 liblegion.1.dylib`Legion::Internal::SingleTask::map_all_regions(this=0x00007ff3f7876800, must_epoch_op=0x0000000000000000, defer_args=0x0000000000000000) at legion_tasks.cc:4158:15 [opt]
    frame #12: 0x0000000102147599 liblegion.1.dylib`Legion::Internal::PointTask::perform_mapping(this=0x00007ff3f7876800, must_epoch_owner=0x0000000000000000, args=0x0000000000000000) at legion_tasks.cc:7297:32 [opt]
    frame #13: 0x0000000102160251 liblegion.1.dylib`Legion::Internal::SliceTask::map_and_launch(this=0x00007ff3fa81e600) at legion_tasks.cc:11513:18 [opt]
    frame #14: 0x0000000102380d72 liblegion.1.dylib`Legion::Internal::Runtime::legion_runtime_task(args=0x00007ff4030fda48, arglen=<unavailable>, userdata=<unavailable>, userlen=<unavailable>, p=<unavailable>) at runtime.cc:32353:31 [opt]
    frame #15: 0x00000001036e12f9 librealm.1.dylib`Realm::LocalTaskProcessor::execute_task(this=<unavailable>, func_id=4, task_args=0x000070000746dca8) at proc_impl.cc:1176:5 [opt]
    frame #16: 0x000000010371ec80 librealm.1.dylib`Realm::Task::execute_on_processor(this=0x00007ff4030fd910, p=(id = 2089670227099910144)) at tasks.cc:326:40 [opt]
    frame #17: 0x0000000103724f16 librealm.1.dylib`Realm::KernelThreadTaskScheduler::execute_task(this=<unavailable>, task=<unavailable>) at tasks.cc:1421:11 [opt]
    frame #18: 0x0000000103723563 librealm.1.dylib`Realm::ThreadedTaskScheduler::scheduler_loop(this=0x00007ff403078010) at tasks.cc:1160:6 [opt]
lightsighter commented 3 months ago

Try with the fix for #1664 and see if it still reproduces.

elliottslaughter commented 3 months ago

I cannot reproduce it anymore. So I guess this is resolved.