StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
688 stars 144 forks source link

Realm: LoggingWrapper seg fault #1392

Open syamajala opened 1 year ago

syamajala commented 1 year ago

I am seeing a seg fault when using the LoggingWrapper. Here is a stack trace:

#0  0x00001555528f4238 in nanosleep () from /lib64/libc.so.6
#1  0x00001555528f413e in sleep () from /lib64/libc.so.6
#2  0x000015554a5aa8d6 in Realm::realm_freeze (signal=11) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/runtime_impl.cc:183
#3  <signal handler called>
#4  0x00001555528ae78c in _int_free () from /lib64/libc.so.6
#5  0x000015554e8ace60 in __gnu_cxx::new_allocator<Realm::Memory>::deallocate (this=0x4532f10, __p=0x154a082a2100) at /cm/local/apps/gcc/9.2.0/include/c++/9.2.0/ext/new_allocator.h:128
#6  0x000015554e8ab2e6 in std::allocator_traits<std::allocator<Realm::Memory> >::deallocate (__a=..., __p=0x154a082a2100, __n=1) at /cm/local/apps/gcc/9.2.0/include/c++/9.2.0/bits/alloc_traits.h:470
#7  0x000015554e8a97de in std::_Vector_base<Realm::Memory, std::allocator<Realm::Memory> >::_M_deallocate (this=0x4532f10, __p=0x154a082a2100, __n=1) at /cm/local/apps/gcc/9.2.0/include/c++/9.2.0/bits/stl_vector.h:351
#8  0x000015554cbe3189 in std::vector<Realm::Memory, std::allocator<Realm::Memory> >::_M_realloc_insert<Realm::Memory const&> (this=0x4532f10, __position=..., __args#0=...) at /cm/local/apps/gcc/9.2.0/include/c++/9.2.0/bits/vector.tcc:500
#9  0x000015554cbdf654 in std::vector<Realm::Memory, std::allocator<Realm::Memory> >::push_back (this=0x4532f10, __x=...) at /cm/local/apps/gcc/9.2.0/include/c++/9.2.0/bits/stl_vector.h:1195
#10 0x000015554a532dc2 in Realm::MemoryQueryImpl::mutated_cached_query (this=0x154a0829edb0, after=...) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/machine_impl.cc:2739
#11 0x000015554a532757 in Realm::MemoryQueryImpl::cached_query (this=0x154a0829edb0, m=..., mval=...) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/machine_impl.cc:2649
#12 0x000015554a533886 in Realm::MemoryQueryImpl::cache_next (this=0x154a0829edb0, after=...) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/machine_impl.cc:2924
#13 0x000015554a52e379 in Realm::Machine::MemoryQuery::next (this=0x154a25ffbd00, after=...) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/machine_impl.cc:1415
#14 0x000015554cc55df6 in Realm::MachineQueryIterator<Realm::Machine::MemoryQuery, Realm::Memory>::operator++ (this=0x154a25ffbd00) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/machine.inl:124
#15 0x000015554cc53202 in Legion::Mapping::LoggingWrapper::LoggingWrapper (this=0x154a08040900, mapper=0x154a082a2050, _logger=0x0) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/mappers/logging_wrapper.cc:103
#16 0x000015555482178a in S3DRank::perform_rank_registration(Realm::Machine, Legion::Runtime*, std::set<Realm::Processor, std::less<Realm::Processor>, std::allocator<Realm::Processor> > const&) () from /lustre/scratch/vsyamaj/legion_s3d_viz/Ammonia_PTJ_Cases/pwave_x_1_ammonia/librhsf.so
#17 0x000015554d2f8f55 in Legion::Internal::Runtime::perform_registration_callback (this=0x6675010, callback=0x155554821680 <S3DRank::perform_rank_registration(Realm::Machine, Legion::Runtime*, std::set<Realm::Processor, std::less<Realm::Processor>, std::allocator<Realm::Processor> > const&)>, buffer=0x0, buffer_size=0, withargs=false, global=false, preregistered=true, deduplicate=true, dedup_tag=0) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/legion/runtime.cc:17759
#18 0x000015554d2f84a2 in Legion::Internal::Runtime::initialize_runtime (this=0x6675010) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/legion/runtime.cc:17580
#19 0x000015554d326afa in Legion::Internal::Runtime::initialize_runtime_task (args=0x0, arglen=0, userdata=0x1554ebfe8ca0, userlen=8, p=...) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/legion/runtime.cc:31374
#20 0x000015554a585df4 in Realm::LocalTaskProcessor::execute_task (this=0xa2e88d0, func_id=1, task_args=...) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/proc_impl.cc:1135
#21 0x000015554a5f20ee in Realm::Task::execute_on_processor (this=0x8ee25a0, p=...) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/tasks.cc:302
#22 0x000015554a5f5e86 in Realm::KernelThreadTaskScheduler::execute_task (this=0xa2e8bf0, task=0x8ee25a0) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/tasks.cc:1366
#23 0x000015554a5f4d05 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0xa2e8bf0) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/tasks.cc:1105
#24 0x000015554a5f5328 in Realm::ThreadedTaskScheduler::scheduler_loop_wlock (this=0xa2e8bf0) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/tasks.cc:1217
#25 0x000015554a5fc484 in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock> (obj=0xa2e8bf0) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/threads.inl:97
#26 0x000015554a608347 in Realm::KernelThread::pthread_entry (data=0x66de200) at /lustre/scratch/vsyamaj/legion_s3d_viz/legion/runtime/realm/threads.cc:781
#27 0x0000155552bf62de in start_thread () from /lib64/libpthread.so.0
#28 0x0000155552927e83 in clone () from /lib64/libc.so.6
syamajala commented 1 year ago

When you are running with multiple mappers do you make a LoggingWrapper for each one or just one of the mappers?

manopapad commented 1 year ago

When you are running with multiple mappers do you make a LoggingWrapper for each one or just one of the mappers?

You'll want to wrap at least those mappers that you want to get logging output from. If you're creating multiple mappers per Legion process (e.g. a separate mapper per processor), then you'll mostly likely want to wrap all of them.

It looks like something is going wrong with the MemoryQuery that the LoggingWrapper does at initialization time. Can you print the whole machine model that the LoggingWrapper can see (all processors and memories), before it starts doing its queries?