ICLDisco / dplasma

DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Other
10 stars 8 forks source link

PaRSEC profiles READ tasks in GEMM #71

Open josephjohnjj opened 1 year ago

josephjohnjj commented 1 year ago

Describe the bug

PaRSEC profiles READ tasks in GEMM.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff8dfff700 (LWP 189514)]
0x00007ffff49cb387 in raise () from /usr/lib64/libc.so.6
(gdb) bt
#0  0x00007ffff49cb387 in raise () from /usr/lib64/libc.so.6
#1  0x00007ffff49cca78 in abort () from /usr/lib64/libc.so.6
#2  0x00007ffff49c41a6 in __assert_fail_base () from /usr/lib64/libc.so.6
#3  0x00007ffff49c4252 in __assert_fail () from /usr/lib64/libc.so.6
#4  0x00007ffff6d62b40 in parsec_profiling_trace_flags (context=0x7ffe00000cb0, key=-1, event_id=40, taskpool_id=8, info=0x0, flags=0)
    at /home/mannaparambil/parsec/parsec/profiling.c:990
#5  0x00007ffff6d94c93 in task_profiler_exec_count_begin (es=0x7ffe00000950, task=0x7ffdfc20a4d0, cb_data=0x7ffe00000930)
    at /home/mannaparambil/parsec/parsec/mca/pins/task_profiler/pins_task_profiler_module.c:240
#6  0x00007ffff6d8d1f9 in parsec_pins_instrument (es=0x7ffe00000950, method_flag=EXEC_BEGIN, task=0x7ffdfc20a4d0) at /home/mannaparambil/parsec/parsec/mca/pins/pins.c:26
#7  0x00007ffff6d4baec in __parsec_execute (es=0x7ffe00000950, task=0x7ffdfc20a4d0) at /home/mannaparambil/parsec/parsec/scheduling.c:162
#8  0x00007ffff6d4c82e in __parsec_task_progress (es=0x7ffe00000950, task=0x7ffdfc20a4d0, distance=6) at /home/mannaparambil/parsec/parsec/scheduling.c:503
#9  0x00007ffff6d4cd28 in __parsec_context_wait (es=0x7ffe00000950) at /home/mannaparambil/parsec/parsec/scheduling.c:668
#10 0x00007ffff6d2ed5b in __parsec_thread_init (startup=0xbcf7c0) at /home/mannaparambil/parsec/parsec/parsec.c:348
#11 0x00007fffc9699ea5 in start_thread () from /usr/lib64/libpthread.so.0
#12 0x00007ffff4a93b0d in clone () from /usr/lib64/libc.so.6
(gdb) frame 4
#4  0x00007ffff6d62b40 in parsec_profiling_trace_flags (context=0x7ffe00000cb0, key=-1, event_id=40, taskpool_id=8, info=0x0, flags=0)
    at /home/mannaparambil/parsec/parsec/profiling.c:990
990     assert( key >= 2 );
(gdb) frame 7
#7  0x00007ffff6d4baec in __parsec_execute (es=0x7ffe00000950, task=0x7ffdfc20a4d0) at /home/mannaparambil/parsec/parsec/scheduling.c:162
162     PARSEC_PINS(es, EXEC_BEGIN, task);
(gdb) p *task->task_class
$1 = {name = 0x7ffff7928cc7 "READ_A", flags = 33, task_class_id = 0 '\000', nb_flows = 1 '\001', nb_parameters = 2 '\002', nb_locals = 3 '\003', task_class_type = 1 '\001',
  dependencies_goal = 1, params = {0x7ffff7c9f4e0 <symb_dgemm_NN_summa_READ_A_m>, 0x7ffff7c9f460 <symb_dgemm_NN_summa_READ_A_k>, 0x0 <repeats 18 times>}, locals = {
    0x7ffff7c9f4e0 <symb_dgemm_NN_summa_READ_A_m>, 0x7ffff7c9f460 <symb_dgemm_NN_summa_READ_A_k>, 0x7ffff7c9f400 <symb_dgemm_NN_summa_READ_A_loc_A>, 0x0 <repeats 17 times>},
  in = {0x7ffff7ca0be0 <flow_of_dgemm_NN_summa_READ_A_for_A>, 0x0 <repeats 19 times>}, out = {0x7ffff7ca0be0 <flow_of_dgemm_NN_summa_READ_A_for_A>, 0x0 <repeats 19 times>},
  priority = 0x0, properties = 0x7ffff792acd0 <properties_of_dgemm_NN_summa_READ_A>, initial_data = 0x7ffff72974ce <affinity_of_dgemm_NN_summa_READ_A>,
  final_data = 0x7ffff72974ce <affinity_of_dgemm_NN_summa_READ_A>, data_affinity = 0x7ffff72974ce <affinity_of_dgemm_NN_summa_READ_A>,
  key_functions = 0x7ffff7dd2a80 <__jdf2c_key_fns_READ_A>, make_key = 0x7ffff72977bb <__jdf2c_make_key_READ_A>, task_snprintf = 0x7ffff6d33175 <parsec_task_snprintf>,
  get_datatype = 0x7ffff729755c <datatype_lookup_of_dgemm_NN_summa_READ_A>, prepare_input = 0x7ffff729d18e <data_lookup_of_dgemm_NN_summa_READ_A>, incarnations = 0x11d0c570,
  prepare_output = 0x0, find_deps = 0x7ffff6d31de8 <parsec_hash_find_deps>, update_deps = 0x7ffff6d320c0 <parsec_update_deps_with_mask>,
  iterate_successors = 0x7ffff72982e5 <iterate_successors_of_dgemm_NN_summa_READ_A>, iterate_predecessors = 0x0,
  release_deps = 0x7ffff729c232 <release_deps_of_dgemm_NN_summa_READ_A>, complete_execution = 0x7ffff729c657 <complete_hook_of_dgemm_NN_summa_READ_A>, new_task = 0x0,
  release_task = 0x7ffff7298c12 <release_task_of_dgemm_NN_summa_READ_A>, fini = 0x0}

Profiling is 'off' in the JDF, for this task class

/**************************************************
 *                       READ_A                   *
 **************************************************/
READ_A(k, m)  [profile = off] 

To Reproduce

Steps to reproduce the behavior:

  1. Checkout branch skew_distribution in https://github.com/josephjohnjj/parsec/tree/skew_distribution
  2. Build using external parsec
  3. Run in tracaing mode

Expected behavior

Program runs to completion without profiling READ tasks.

Environment (please complete the following information):

Additional context

READ_C and WRITE_C tasks were introduced to adapt the code for inter-node task migration. Changes were also made to the data flow types.