ICLDisco / parsec

PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.
Other
47 stars 17 forks source link

Profiling 'set_scheduler' segfaults on startup (Saturn jenkins config) #211

Closed abouteiller closed 5 years ago

abouteiller commented 5 years ago

Original report by me.



Thread 1 "touch_ex" received signal SIGSEGV, Segmentation fault.
0x00007ffff7aa8446 in parsec_list_push_back (list=0x7ffff7dd7f60 <global_informations>, item=0x769060) at /home/bouteill/parsec/master/parsec/parsec/class/list.h:1064
1064        _TAIL(list)->list_next = item;
(gdb) list
1059    {
1060        PARSEC_ITEM_ATTACH(list, item);
1061        item->list_next = _GHOST(list);
1062        parsec_atomic_lock(&list->atomic_lock);
1063        item->list_prev = _TAIL(list);
1064        _TAIL(list)->list_next = item;
1065        _TAIL(list) = item;
1066        parsec_atomic_unlock(&list->atomic_lock);
1067    }
1068
(gdb) bt
#0    0x00007ffff7aa8446 in parsec_list_push_back (list=0x7ffff7dd7f60 <global_informations>, item=0x769060)
                         at /home/bouteill/parsec/master/parsec/parsec/class/list.h:1064
#1    0x00007ffff7aa93c3 in parsec_profiling_add_information (key=0x7ffff7b89519 "sched", value=0x7ffff7dcbd58 <parsec_sched_lfq_component+56> "lfq")
                         at /home/bouteill/parsec/master/parsec/parsec/profiling_otf2.c:171
#2    0x00007ffff7aac565 in profiling_save_sinfo (key=0x7ffff7b89519 "sched", svalue=0x7ffff7dcbd58 <parsec_sched_lfq_component+56> "lfq")
                         at /home/bouteill/parsec/master/parsec/parsec/profiling_otf2.c:1066
#3    0x00007ffff7ab80e6 in parsec_set_scheduler (parsec=0x736f90) at /home/bouteill/parsec/master/parsec/parsec/scheduling.c:271
#4    0x00007ffff7a9a5f8 in parsec_init (nb_cores=1, pargc=0x7fffffffa9ec, pargv=0x7fffffffa9e0)
                         at /home/bouteill/parsec/master/parsec/parsec/parsec.c:727
#5    0x000000000040293c in main (argc=1, argv=0x7fffffffaaf8) at /home/bouteill/parsec/master/parsec/tests/touch_ex.c:42
(gdb) up
#1  0x00007ffff7aa93c3 in parsec_profiling_add_information (key=0x7ffff7b89519 "sched", value=0x7ffff7dcbd58 <parsec_sched_lfq_component+56> "lfq")
    at /home/bouteill/parsec/master/parsec/parsec/profiling_otf2.c:171
171         parsec_list_push_back(&global_informations, &new_info->super);
(gdb) up
#2  0x00007ffff7aac565 in profiling_save_sinfo (key=0x7ffff7b89519 "sched", svalue=0x7ffff7dcbd58 <parsec_sched_lfq_component+56> "lfq")
    at /home/bouteill/parsec/master/parsec/parsec/profiling_otf2.c:1066
1066        parsec_profiling_add_information(key, svalue);
(gdb) up
#3  0x00007ffff7ab80e6 in parsec_set_scheduler (parsec=0x736f90) at /home/bouteill/parsec/master/parsec/parsec/scheduling.c:271
271         PROFILING_SAVE_sINFO("sched", (char *)current_scheduler->component->base_version.mca_component_name);
(gdb) up
#4  0x00007ffff7a9a5f8 in parsec_init (nb_cores=1, pargc=0x7fffffffa9ec, pargv=0x7fffffffa9e0) at /home/bouteill/parsec/master/parsec/parsec/parsec.c:727
727         if( 0 == parsec_set_scheduler( context ) ) {
(gdb) down
#3  0x00007ffff7ab80e6 in parsec_set_scheduler (parsec=0x736f90) at /home/bouteill/parsec/master/parsec/parsec/scheduling.c:271
271         PROFILING_SAVE_sINFO("sched", (char *)current_scheduler->component->base_version.mca_component_name);
(gdb)
abouteiller commented 5 years ago

Original comment by George Bosilca (Bitbucket: bosilca, GitHub: bosilca).


A missing check that profiling was initialized in parsec_profiling_add_information would cause a SEGFAULT when changing the scheduler and not enabling profiling in the OTF2 driver case.

I checked that all other exposed functions do a similar check or an equivalent one.

Fixes issue #211

→ <<cset 0f9a3299aeae (bb)>>