StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
657 stars 146 forks source link

Fuzzer: pending_equivalence_sets is null in finalize_manager #1664

Closed elliottslaughter closed 3 months ago

elliottslaughter commented 3 months ago

It's harder to find crashes now, but I hit this one:

Assertion failed: (pending_equivalence_sets == NULL), function finalize_manager, file legion_analysis.cc, line 23931.

This is the Legion branch that has both fixinvalidation and fixvirtualinit merged together.

Fuzzer at version https://github.com/StanfordLegion/fuzzer/commit/3057d03ee7e7284337ca22515bef9ba02ce98f45

Command line:

./fuzzer -fuzz:seed 367 -fuzz:ops 16 -fuzz:skip 3 -level 4

The higher seed number means it took me longer to find it. 😃

Backtrace:

  * frame #4: 0x0000000101e7c690 liblegion.1.dylib`Legion::Internal::VersionManager::finalize_manager(this=0x00007fa3abf18b60) at reservation.inl:0:15 [opt]
    frame #5: 0x00000001022ba7da liblegion.1.dylib`Legion::Internal::RegionNode::notify_local(this=0x00007fa3f1019800) at region_tree.cc:16927:44 [opt]
    frame #6: 0x0000000101da0ee0 liblegion.1.dylib`Legion::Internal::DistributedCollectable::perform_downgrade(this=0x00007fa3f1019800, gc=0x0000700002134680) at garbage_collection.cc:893:7 [opt]
    frame #7: 0x0000000101d9df28 liblegion.1.dylib`Legion::Internal::DistributedCollectable::remove_gc_reference(this=0x00007fa3f1019800, cnt=1) at garbage_collection.cc:153:16 [opt]
    frame #8: 0x00000001022bf83c liblegion.1.dylib`Legion::Internal::PartitionNode::notify_local() [inlined] Legion::Internal::DistributedCollectable::remove_nested_gc_ref(this=<unavailable>, source=<unavailable>, cnt=1) at garbage_collection.h:652:14 [opt]
    frame #9: 0x00000001022bf80c liblegion.1.dylib`Legion::Internal::PartitionNode::notify_local(this=<unavailable>) at region_tree.cc:18015:21 [opt]
    frame #10: 0x0000000101da0ee0 liblegion.1.dylib`Legion::Internal::DistributedCollectable::perform_downgrade(this=0x00007fa3fd038400, gc=0x0000700002134720) at garbage_collection.cc:893:7 [opt]
    frame #11: 0x0000000101d9df28 liblegion.1.dylib`Legion::Internal::DistributedCollectable::remove_gc_reference(this=0x00007fa3fd038400, cnt=1) at garbage_collection.cc:153:16 [opt]
    frame #12: 0x000000010229764f liblegion.1.dylib`Legion::Internal::PartitionTracker::remove_partition_reference() [inlined] Legion::Internal::DistributedCollectable::remove_base_gc_ref(this=0x00007fa3fd038400, source=REGION_TREE_REF, cnt=1) at garbage_collection.h:623:14 [opt]
    frame #13: 0x0000000102297621 liblegion.1.dylib`Legion::Internal::PartitionTracker::remove_partition_reference(this=<unavailable>) at region_tree.cc:17937:33 [opt]
    frame #14: 0x00000001022ba6c2 liblegion.1.dylib`Legion::Internal::RegionNode::notify_local(this=0x00007fa3fd020600) at region_tree.cc:16921:22 [opt]
    frame #15: 0x0000000101da0ee0 liblegion.1.dylib`Legion::Internal::DistributedCollectable::perform_downgrade(this=0x00007fa3fd020600, gc=0x0000700002134810) at garbage_collection.cc:893:7 [opt]
    frame #16: 0x0000000101d9df28 liblegion.1.dylib`Legion::Internal::DistributedCollectable::remove_gc_reference(this=0x00007fa3fd020600, cnt=1) at garbage_collection.cc:153:16 [opt]
    frame #17: 0x000000010226c6ff liblegion.1.dylib`Legion::Internal::RegionTreeForest::destroy_logical_region(this=<unavailable>, handle=LogicalRegion @ 0x00007000021348a0, applied=size=0, mapping=<unavailable>) at region_tree.cc:0 [opt]
    frame #18: 0x0000000102023d8a liblegion.1.dylib`Legion::Internal::DeletionOp::trigger_complete(this=0x00007fa3f430d900) at legion_ops.cc:10942:30 [opt]
    frame #19: 0x0000000101ff4966 liblegion.1.dylib`Legion::Internal::Operation::complete_execution(this=0x00007fa3f430d900, wait_on=RtEvent @ 0x0000700002134938) at legion_ops.cc:0 [opt]
    frame #20: 0x0000000102023c1c liblegion.1.dylib`Legion::Internal::DeletionOp::trigger_mapping(this=0x00007fa3f430d900) at legion_ops.cc:0 [opt]
    frame #21: 0x000000010237fa44 liblegion.1.dylib`Legion::Internal::Runtime::legion_runtime_task(args=0x00007fa3fbf14be8, arglen=<unavailable>, userdata=<unavailable>, userlen=<unavailable>, p=<unavailable>) at runtime.cc:32345:31 [opt]
    frame #22: 0x00000001036df2f9 librealm.1.dylib`Realm::LocalTaskProcessor::execute_task(this=<unavailable>, func_id=4, task_args=0x0000700002134ca8) at proc_impl.cc:1176:5 [opt]
    frame #23: 0x000000010371cc80 librealm.1.dylib`Realm::Task::execute_on_processor(this=0x00007fa3fbf14ab0, p=(id = 2089670227099910144)) at tasks.cc:326:40 [opt]
    frame #24: 0x0000000103722f16 librealm.1.dylib`Realm::KernelThreadTaskScheduler::execute_task(this=<unavailable>, task=<unavailable>) at tasks.cc:1421:11 [opt]
    frame #25: 0x0000000103721563 librealm.1.dylib`Realm::ThreadedTaskScheduler::scheduler_loop(this=0x00007fa3fc878010) at tasks.cc:1160:6 [opt]
    frame #26: 0x0000000103724ffe librealm.1.dylib`void Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock()>(void*) [inlined] Realm::ThreadedTaskScheduler::scheduler_loop_wlock(this=0x00007fa3fc878010) at tasks.cc:1272:5 [opt]
    frame #27: 0x0000000103724fea librealm.1.dylib`void Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock()>(obj=0x00007fa3fc878010) at threads.inl:97:5 [opt]
    frame #28: 0x0000000103728cce librealm.1.dylib`Realm::KernelThread::pthread_entry(data=0x00006000037b8160) at threads.cc:831:5 [opt]
lightsighter commented 3 months ago

I'm just going to keep pushing commits to this branch while the CI is slow: https://gitlab.com/StanfordLegion/legion/-/commit/d2166b12a92910dda9c5b339b78425fdfa0301e6

elliottslaughter commented 3 months ago

Yes, this is resolved.