StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
669 stars 146 forks source link

[BUG] Legion Multinode Crash UBSAN Error #1726

Open Jacobfaib opened 1 month ago

Jacobfaib commented 1 month ago

remote_task member is used before it is initialized:

RemoteContext::RemoteContext(DistributedID id, Runtime *rt,
                                 CollectiveMapping *mapping)
      : InnerContext(rt, NULL, -1, false/*full inner*/, remote_task.regions, // <<< used here
                     remote_task.output_regions, local_parent_req_indexes, // <<< and here
                     local_virtual_mapped, ApEvent::NO_AP_EVENT, id,
                     false, false, false, mapping),
        parent_ctx(NULL), shard_manager(NULL), provenance(NULL),
        top_level_context(false), remote_task(RemoteTask(this)), // <<< initialized here
        remote_uid(0), repl_id(0)
    //--------------------------------------------------------------------------
    {
    }
legion-src/runtime/legion/legion_context.cc:22561:57: runtime error: member access within address 0x000110b55c00 which does not point to an object of type 'Legion::Task'
0x000110b55c00: note: object has invalid vptr
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              invalid vptr
    #0 0x1257bb650 in Legion::Internal::RemoteContext::RemoteContext(unsigned long long, Legion::Internal::Runtime*, Legion::Internal::CollectiveMapping*) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/liblegion.1.dylib:arm64+0x1ba7650)
    #1 0x1257bc1f0 in Legion::Internal::RemoteContext::RemoteContext(unsigned long long, Legion::Internal::Runtime*, Legion::Internal::CollectiveMapping*) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/liblegion.1.dylib:arm64+0x1ba81f0)
    #2 0x1257db05c in Legion::Internal::RemoteContext::handle_context_response(Legion::Deserializer&, Legion::Internal::Runtime*) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/liblegion.1.dylib:arm64+0x1bc705c)
    #3 0x1285c2400 in Legion::Internal::Runtime::handle_remote_context_response(Legion::Deserializer&) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/liblegion.1.dylib:arm64+0x49ae400)
    #4 0x1285ad3c8 in Legion::Internal::VirtualChannel::handle_messages(unsigned int, Legion::Internal::Runtime*, unsigned int, char const*, unsigned long) const (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/liblegion.1.dylib:arm64+0x49993c8)
    #5 0x1285a6fb0 in Legion::Internal::VirtualChannel::process_message(void const*, unsigned long, Legion::Internal::Runtime*, unsigned int) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/liblegion.1.dylib:arm64+0x4992fb0)
    #6 0x1285d5e74 in Legion::Internal::MessageManager::receive_message(void const*, unsigned long) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/liblegion.1.dylib:arm64+0x49c1e74)
    #7 0x128745a88 in Legion::Internal::Runtime::process_message_task(void const*, unsigned long) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/liblegion.1.dylib:arm64+0x4b31a88)
    #8 0x1287ebc34 in Legion::Internal::Runtime::legion_runtime_task(void const*, unsigned long, void const*, unsigned long, Realm::Processor) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/liblegion.1.dylib:arm64+0x4bd7c34)
    #9 0x13a6dc540 in Realm::LocalTaskProcessor::execute_task(unsigned int, Realm::ByteArrayRef const&) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/librealm.1.dylib:arm64+0x2a64540)
    #10 0x13aadc9e4 in Realm::Task::execute_on_processor(Realm::Processor) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/librealm.1.dylib:arm64+0x2e649e4)
    #11 0x13ab0bd98 in Realm::KernelThreadTaskScheduler::execute_task(Realm::Task*) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/librealm.1.dylib:arm64+0x2e93d98)
    #12 0x13ab0137c in Realm::ThreadedTaskScheduler::scheduler_loop() (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/librealm.1.dylib:arm64+0x2e8937c)
    #13 0x13ab06070 in Realm::ThreadedTaskScheduler::scheduler_loop_wlock() (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/librealm.1.dylib:arm64+0x2e8e070)
    #14 0x13ab8d6bc in void Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock()>(void*) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/librealm.1.dylib:arm64+0x2f156bc)
    #15 0x13aba8340 in Realm::KernelThread::pthread_entry(void*) (/Users/jfaibussowit/soft/nv/legate.core.internal/arch-darwin-debug/cmake_build/_deps/legion-build/lib/librealm.1.dylib:arm64+0x2f30340)
    #16 0x18f19ef90 in _pthread_start (/usr/lib/system/libsystem_pthread.dylib:arm64e+0x6f90)
    #17 0x18f199d30 in thread_start (/usr/lib/system/libsystem_pthread.dylib:arm64e+0x1d30)
lightsighter commented 1 month ago

This is one of those false positives that I hate from UBSAN. Just because I give out a pointer to an object before it is done being constructed does not mean that I'm using the object before initialization.

I'm not planning on fixing this.