ICLDisco / parsec

PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.
Other
48 stars 16 forks source link

Memory leaks with the DTD interface #640

Open BrieucNicolas opened 6 months ago

BrieucNicolas commented 6 months ago

Describe the bug

Two type of memory leaks cause PaRSEC to segfault sporadically when compiled in Debug mode.

The first one gives the following ouput with valgrind

==196556== 1,376 (168 direct, 1,208 indirect) bytes in 1 blocks are definitely lost in loss record 270 of 288
==196556==    at 0x488A828: posix_memalign (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==196556==    by 0x49C01BF: parsec_lifo_item_alloc (lifo.h:603)
==196556==    by 0x49C0547: parsec_thread_mempool_allocate_when_empty (mempool.c:82)
==196556==    by 0x49DED7B: parsec_thread_mempool_allocate (mempool.h:128)
==196556==    by 0x49E132F: parsec_dtd_tile_of (insert_function.c:1288)

I believe this has already been identified in issue #178

The other "leak" gives me the following output :

==196556== 2,960 (2,416 direct, 544 indirect) bytes in 2 blocks are definitely lost in loss record 281 of 288
==196556==    at 0x488A828: posix_memalign (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==196556==    by 0x49C01BF: parsec_lifo_item_alloc (lifo.h:603)
==196556==    by 0x49C0547: parsec_thread_mempool_allocate_when_empty (mempool.c:82)
==196556==    by 0x49DED7B: parsec_thread_mempool_allocate (mempool.h:128)
==196556==    by 0x49E5497: parsec_dtd_create_and_initialize_task (insert_function.c:2663)
==196556==    by 0x49DCB6B: parsec_dtd_insert_flush_task (parsec_dtd_data_flush.c:266)
==196556==    by 0x49DCD27: parsec_dtd_insert_flush_task_pair (parsec_dtd_data_flush.c:333)
==196556==    by 0x49DCDB3: parsec_internal_dtd_data_flush (parsec_dtd_data_flush.c:352)
==196556==    by 0x4995E23: parsec_hash_table_for_all (parsec_hash_table.c:641)
==196556==    by 0x49DCE77: parsec_dtd_data_flush_all (parsec_dtd_data_flush.c:389)

I have not encountered it with dplasma, so it is possible it is a mistake on my part in which case i would gladly appreciate any intuition as to what the probleme could be.

Environment (please complete the following information):

Additional context

PaRSEC is used here in the Chameleon application, which does the same thing as Dplasma with the DTD interface.

abouteiller commented 6 months ago

Thanks Brieuc, could you provide the config command line as well as the test command to replicate?

bosilca commented 6 months ago

on OSX run any dplasma dtd test with leaks --atExit -- ***

BrieucNicolas commented 6 months ago

You can compile chameleon with the following git : https://gitlab.inria.fr/bnicolas/chameleon

compile with the command

cmake /path/to/chameleon -DCHAMELEON_SCHED=PARSEC

and if parsec is in debug mode, you should observe segfaults with

ctest -V -I 29,29,1