bsc-performance-tools / extrae

Instrumentation framework to generate execution traces of the most used parallel runtimes.
https://tools.bsc.es/extrae
GNU Lesser General Public License v2.1
58 stars 35 forks source link

MPI test failures #98

Open bkmgit opened 7 months ago

bkmgit commented 7 months ago

When testing Extrae 4.0.6 on Fedora 39 with mpich 4.1.2 the following tests fail:

==============================================================
   Extrae 4.0.6: tests/functional/tracer/MPI/test-suite.log
==============================================================

# TOTAL: 21
# PASS:  18
# SKIP:  0
# XFAIL: 0
# FAIL:  3
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: mpi_sendirecv_c.sh
========================

Welcome to Extrae 4.0.6
Extrae: Parsing the configuration file (extrae.xml) begins
Extrae: Tracing package is located on /home/harald/aplic/extrae/3.3.0rc
Extrae: Generating intermediate files for Paraver traces.
Extrae: MPI routines will NOT collect HW counters information.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (extrae.xml) has ended
Extrae: Intermediate traces will be stored in /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads

Extrae: Successfully initiated with 1 tasks and 1 threads

Assertion failed in file src/mpi/datatype/typerep/src/typerep_yaksa_pack.c at line 315: FALSE
memcpy argument memory ranges overlap, dst_=0xffffca743540 src_=0xffffca743540 len_=4

Abort(1) on node 0: Internal error
Extrae: Intermediate raw trace file created : /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000475462000000000000.mpit
Extrae: Intermediate raw sym file created : /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000475462000000000000.sym
Extrae: Deallocating memory.
Extrae: Application has ended. Tracing has been terminated.
merger: Output trace format is: Paraver
merger: Extrae 4.0.6
mpi2prv: Assigned nodes < ip-172-31-79-137.ec2.internal >
mpi2prv: Assigned size per processor < <1 Mbyte >
mpi2prv: File /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000475462000000000000.mpit is object 1.1.1 on node ip-172-31-79-137.ec2.internal assigned to processor 0
mpi2prv: Time synchronization has been turned off
mpi2prv: Checking for target directory existence... exists, ok!
mpi2prv: Selected output trace format is Paraver
mpi2prv: Stored trace format is Paraver
mpi2prv: Enabling Time Synchronization (Node).
mpi2prv: Circular buffer enabled at tracing time? NO
mpi2prv: Parsing intermediate files
mpi2prv: Progress 1 of 2 ... 5% 11% 17% 23% 29% 35% 41% 47% 52% 58% 64% 70% 76% 82% 88% 94% 100% done
mpi2prv: Processor 0 succeeded to translate its assigned files
mpi2prv: Elapsed time translating files: 0 hours 0 minutes 0 seconds
mpi2prv: Elapsed time sorting addresses: 0 hours 0 minutes 0 seconds
mpi2prv: Generating tracefile (intermediate buffers of 6710784 events)
         This process can take a while. Please, be patient.
mpi2prv: Progress 2 of 2 ... 12% 15% 21% 33% 36% 42% 54% 57% 60% 66% 72% 75% 81% 87% 90% 100% done
mpi2prv: Elapsed time merge step: 0 hours 0 minutes 0 seconds
mpi2prv: Resulting tracefile occupies 828 bytes
mpi2prv: Removing temporal files... done
mpi2prv: Elapsed time removing temporal files: 0 hours 0 minutes 0 seconds
mpi2prv: Congratulations! ./mpi_sendirecv_c.prv has been generated.
Error! Could not find 'MPI_Wait'.
FAIL mpi_sendirecv_c.sh (exit status: 1)

FAIL: mpi_isendirecv_c.sh
=========================

Welcome to Extrae 4.0.6
Extrae: Parsing the configuration file (extrae.xml) begins
Extrae: Tracing package is located on /home/harald/aplic/extrae/3.3.0rc
Extrae: Generating intermediate files for Paraver traces.
Extrae: MPI routines will NOT collect HW counters information.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (extrae.xml) has ended
Extrae: Intermediate traces will be stored in /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads

Extrae: Successfully initiated with 1 tasks and 1 threads

Assertion failed in file src/mpi/datatype/typerep/src/typerep_yaksa_pack.c at line 315: FALSE
memcpy argument memory ranges overlap, dst_=0xffffd2915ac4 src_=0xffffd2915ac4 len_=4

Abort(1) on node 0: Internal error
Extrae: Intermediate raw trace file created : /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000475490000000000000.mpit
Extrae: Intermediate raw sym file created : /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000475490000000000000.sym
Extrae: Deallocating memory.
Extrae: Application has ended. Tracing has been terminated.
merger: Output trace format is: Paraver
merger: Extrae 4.0.6
mpi2prv: Assigned nodes < ip-172-31-79-137.ec2.internal >
mpi2prv: Assigned size per processor < <1 Mbyte >
mpi2prv: File /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000475490000000000000.mpit is object 1.1.1 on node ip-172-31-79-137.ec2.internal assigned to processor 0
mpi2prv: Time synchronization has been turned off
mpi2prv: Checking for target directory existence... exists, ok!
mpi2prv: Selected output trace format is Paraver
mpi2prv: Stored trace format is Paraver
mpi2prv: Enabling Time Synchronization (Node).
mpi2prv: Circular buffer enabled at tracing time? NO
mpi2prv: Parsing intermediate files
mpi2prv: Progress 1 of 2 ... 5% 11% 17% 23% 29% 35% 41% 47% 52% 58% 64% 70% 76% 82% 88% 94% 100% done
mpi2prv: Processor 0 succeeded to translate its assigned files
mpi2prv: Elapsed time translating files: 0 hours 0 minutes 0 seconds
mpi2prv: Elapsed time sorting addresses: 0 hours 0 minutes 0 seconds
mpi2prv: Generating tracefile (intermediate buffers of 6710784 events)
         This process can take a while. Please, be patient.
mpi2prv: Progress 2 of 2 ... 11% 17% 20% 32% 35% 41% 52% 55% 61% 67% 70% mpi2prv: Error! Found unmatched communication! Continuing...
76% 82% 85% 91% 100% done
mpi2prv: Error! Found 1 unmatched communications. Resulting tracefile may be inconsistent.
mpi2prv: Elapsed time merge step: 0 hours 0 minutes 0 seconds
mpi2prv: Resulting tracefile occupies 830 bytes
mpi2prv: Removing temporal files... done
mpi2prv: Elapsed time removing temporal files: 0 hours 0 minutes 0 seconds
mpi2prv: Congratulations! ./mpi_isendirecv_c.prv has been generated.
Error! Could not find 'MPI_Wait'.
FAIL mpi_isendirecv_c.sh (exit status: 1)

FAIL: mpi_isendirecvwaitall_c.sh
================================

Welcome to Extrae 4.0.6
Extrae: Parsing the configuration file (extrae.xml) begins
Extrae: Tracing package is located on /home/harald/aplic/extrae/3.3.0rc
Extrae: Generating intermediate files for Paraver traces.
Extrae: MPI routines will NOT collect HW counters information.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (extrae.xml) has ended
Extrae: Intermediate traces will be stored in /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads

Extrae: Successfully initiated with 1 tasks and 1 threads

Assertion failed in file src/mpi/datatype/typerep/src/typerep_yaksa_pack.c at line 315: FALSE
memcpy argument memory ranges overlap, dst_=0xffffe214f37c src_=0xffffe214f37c len_=4

Abort(1) on node 0: Internal error
Extrae: Intermediate raw trace file created : /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000475518000000000000.mpit
Extrae: Intermediate raw sym file created : /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000475518000000000000.sym
Extrae: Deallocating memory.
Extrae: Application has ended. Tracing has been terminated.
merger: Output trace format is: Paraver
merger: Extrae 4.0.6
mpi2prv: Assigned nodes < ip-172-31-79-137.ec2.internal >
mpi2prv: Assigned size per processor < <1 Mbyte >
mpi2prv: File /home/fedora/extrae-4.0.6-mpich-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000475518000000000000.mpit is object 1.1.1 on node ip-172-31-79-137.ec2.internal assigned to processor 0
mpi2prv: Time synchronization has been turned off
mpi2prv: Checking for target directory existence... exists, ok!
mpi2prv: Selected output trace format is Paraver
mpi2prv: Stored trace format is Paraver
mpi2prv: Enabling Time Synchronization (Node).
mpi2prv: Circular buffer enabled at tracing time? NO
mpi2prv: Parsing intermediate files
mpi2prv: Progress 1 of 2 ... 5% 11% 16% 22% 27% 33% 38% 44% 50% 55% 61% 66% 72% 77% 83% 88% 94% 100% done
mpi2prv: Processor 0 succeeded to translate its assigned files
mpi2prv: Elapsed time translating files: 0 hours 0 minutes 0 seconds
mpi2prv: Elapsed time sorting addresses: 0 hours 0 minutes 0 seconds
mpi2prv: Generating tracefile (intermediate buffers of 6710784 events)
         This process can take a while. Please, be patient.
mpi2prv: Progress 2 of 2 ... 14% 17% 22% 34% 37% 42% 54% 57% 62% 65% 71% mpi2prv: Error! Found unmatched communication! Continuing...
77% 82% 85% 91% 100% done
mpi2prv: Error! Found 1 unmatched communications. Resulting tracefile may be inconsistent.
mpi2prv: Elapsed time merge step: 0 hours 0 minutes 0 seconds
mpi2prv: Resulting tracefile occupies 841 bytes
mpi2prv: Removing temporal files... done
mpi2prv: Elapsed time removing temporal files: 0 hours 0 minutes 0 seconds
mpi2prv: Congratulations! ./mpi_isendirecvwaitall_c.prv has been generated.
Error! Could not find 'MPI_Waitall'.
FAIL mpi_isendirecvwaitall_c.sh (exit status: 1)

When testing Extrae 4.0.6 on Fedora 39 with OpenMPI 5.0.0 the following tests fail:

==============================================================
   Extrae 4.0.6: tests/functional/tracer/MPI/test-suite.log
==============================================================

# TOTAL: 21
# PASS:  20
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: mpi_commranksize_f_1proc.sh
=================================

Welcome to Extrae 4.0.6
Extrae: Parsing the configuration file (extrae.xml) begins
Extrae: Tracing package is located on /home/harald/aplic/extrae/3.3.0rc
Extrae: Generating intermediate files for Paraver traces.
Extrae: MPI routines will NOT collect HW counters information.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (extrae.xml) has ended
Extrae: Intermediate traces will be stored in /home/fedora/extrae-4.0.6-openmpi-self-install/tests/functional/tracer/MPI
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 1 threads

Extrae: Intermediate raw trace file created : /home/fedora/extrae-4.0.6-openmpi-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000521208000000000000.mpit
Extrae: Intermediate raw sym file created : /home/fedora/extrae-4.0.6-openmpi-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000521208000000000000.sym
Extrae: Deallocating memory.
Extrae: Application has ended. Tracing has been terminated.
*** The MPI_Allreduce() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[ip-172-31-79-137.ec2.internal:521208] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
merger: Output trace format is: Paraver
merger: Extrae 4.0.6
mpi2prv: Assigned nodes < ip-172-31-79-137.ec2.internal >
mpi2prv: Assigned size per processor < <1 Mbyte >
mpi2prv: File /home/fedora/extrae-4.0.6-openmpi-self-install/tests/functional/tracer/MPI/set-0/TRACE@ip-172-31-79-137.ec2.internal.0000521208000000000000.mpit is object 1.1.1 on node ip-172-31-79-137.ec2.internal assigned to processor 0
mpi2prv: Time synchronization has been turned off
mpi2prv: Checking for target directory existence... exists, ok!
mpi2prv: Selected output trace format is Paraver
mpi2prv: Stored trace format is Paraver
mpi2prv: Enabling Time Synchronization (Node).
mpi2prv: Circular buffer enabled at tracing time? NO
mpi2prv: Parsing intermediate files
mpi2prv: Progress 1 of 2 ... 12% 25% 37% 50% 62% 75% 87% 100% done
mpi2prv: Processor 0 succeeded to translate its assigned files
mpi2prv: Elapsed time translating files: 0 hours 0 minutes 0 seconds
mpi2prv: Elapsed time sorting addresses: 0 hours 0 minutes 0 seconds
mpi2prv: Generating tracefile (intermediate buffers of 6710784 events)
         This process can take a while. Please, be patient.
mpi2prv: Progress 2 of 2 ... 5% 21% 26% 31% 36% 57% 63% 68% 73% 78% 84% 89% 100% done
mpi2prv: Elapsed time merge step: 0 hours 0 minutes 0 seconds
mpi2prv: Resulting tracefile occupies 477 bytes
mpi2prv: Removing temporal files... done
mpi2prv: Elapsed time removing temporal files: 0 hours 0 minutes 0 seconds
mpi2prv: Congratulations! ./mpi_commranksize_f_1proc.prv has been generated.
Error! Could not find 'MPI_Init'.
FAIL mpi_commranksize_f_1proc.sh (exit status: 1)