OpenSpeedShop / openspeedshop

Open|SpeedShop is a community effort by The Krell Institute with current direct funding from DOE’s NNSA and Office of Science. It is building on top of a broad list of community infrastructures, most notably Dyninst and MRNet from UW, libmonitor from Rice, and PAPI from UTK. Open|SpeedShop is an open source multi platform Linux performance tool which is targeted to support performance analysis of applications running on both single node and large scale Intel, AMD, ARM, Intel Phi, PPC, GPU processor based systems and on Blue Gene and Cray platforms.
https://www.openspeedshop.org
Other
25 stars 10 forks source link

No cbtf mode experiments work with MPI cuda executables at NASA #5

Closed jgalarowicz closed 6 years ago

jgalarowicz commented 6 years ago

On NASA pfe I can't get any cbtf mode experiments to run with the MPI cuda executables (shoc GEMM).

However, matrixmul works on NASA pfe (end of post) cbtf mode experiments work with nbody but not with a cuda application.
This is a different problem because this seems to be related to hooking up the MRNet network.

HWC:

[openss]: hwc using user specified threshold: 100000000. [openss]: hwc using default papi event: "PAPI_TOT_CYC". Creating topology file for PBS frontend node r313i0n5 Generated topology file: ./cbtfAutoTopology Running hwc collector. Program: mpiexec_mpt -n 4 ./GEMM Number of mrnet backends: 4 Topology file used: ./cbtfAutoTopology executing mpi program: mpiexec_mpt -n 4 cbtfrun --mpi --mrnet -c hwc ./GEMM MPT: Launch to r313i3n5 failed (262147), retrying... MPT: Launch to r313i3n6 failed (262147), retrying... MPT: Launch to r313i3n7 failed (262147), retrying... 232213056.239003: (XPlat) SocketUtils.c[100] XPlat_SocketUtils_Connect - gethostbyname() failed with 'Success' 232213056.239028: utils_lightweight.c[48] connectHost - failed to connect to r313i0n5.p7.nas.nasa.gov:34553 ./GEMM: connectHost() failed: Unknown error 10002 232213056.238638: (XPlat) SocketUtils.c[100] XPlat_SocketUtils_Connect - gethostbyname() failed with 'Success' 232213056.238664: utils_lightweight.c[48] connectHost - failed to connect to r313i0n5.p7.nas.nasa.gov:34553 232213056.238410: (XPlat) SocketUtils.c[100] XPlat_SocketUtils_Connect - gethostbyname() failed with 'Success' ./GEMM: connectHost() failed: Unknown error 10003 232213056.238434: utils_lightweight.c[48] connectHost - failed to connect to r313i0n5.p7.nas.nasa.gov:34553 ./GEMM: connectHost() failed: Unknown error 10001 MPT ERROR: MPI_COMM_WORLD rank 3 has terminated without calling MPI_Finalize() aborting job Terminated

CUDA:

[openss]: cuda using configuration from OPENSS_CUDA_CONFIG. [openss]: cuda configuration: "interval=10000000,PAPI_FP_OPS,flop_count_sp" Creating topology file for PBS frontend node r313i0n4 Generated topology file: ./cbtfAutoTopology Running cuda collector. Program: mpiexec_mpt -n 4 ./GEMM Number of mrnet backends: 4 Topology file used: ./cbtfAutoTopology /nobackupnfs2/jgalarow/openspeedshop-externals/BUILD/pfe23/mrnet-20180825/xplat/src/NetUtils.C[83]: getaddrinfo(r313i0n4.p7.nas.nasa.gov): Name or service not known 232213616.565104: CP(r313i0n13:0)(0x7fffedad9740): (XPlat) SocketUtils.C[104] Connect - failed to convert name to network address 232213616.583866: EDT(r313i0n13:0)(0x7fffec87f700): (XPlat) SocketUtils.C[104] Connect - failed to convert name to network address 232213616.583957: EDT(r313i0n13:0)(0x7fffec87f700): (XPlat) SocketUtils-unix.C[116] Send - Error: writev() failed with 'Bad file descriptor' terminate called after throwing an instance of 'std::runtime_error' what(): Unable to create the MRNet network. /nobackupnfs2/jgalarow/OSS/sles12/osscbtf_v2.4.0/bin/osscuda: line 1914: 90287 Aborted (core dumped) osscollect $topology_opt $cbtf_offline_opt --program "$1" --collector $collector cat attachBE_connections cat: attachBE_connections: No such file or directory

Non-MPI application with osscuda works:

osscuda ./matrixmul [openss]: cuda using configuration from OPENSS_CUDA_CONFIG. [openss]: cuda configuration: "interval=10000000,PAPI_FP_OPS,flop_count_sp" Creating topology file for PBS frontend node r313i0n0 Generated topology file: ./cbtfAutoTopology Running cuda collector. Program: ./matrixmul Number of mrnet backends: 1 Topology file used: ./cbtfAutoTopology executing sequential program: cbtfrun -c cuda --mrnet ./matrixmul [Matrix Multiply Using CUDA] - Starting... GPU Device 0: "Tesla K40m" with compute capability 3.5

Naive CPU (Golden Reference) Processing time: 400.890717 (ms), GFLOPS: 0.251099 threads: x=16 y=16 grid: x=24 y=16 Naive GPU Processing time: 30.669985 (ms), GFLOPS: 3.282144 Total Errors = 0 Tiling GPU Processing time: 28.113760 (ms), GFLOPS: 3.580570 Total Errors = 0 Global mem coalescing GPU Processing time: 22.634912 (ms), GFLOPS: 4.447258 Total Errors = 0 Remove shared mem bank conflict GPU Processing time: 22.098751 (ms), GFLOPS: 4.555157 Total Errors = 0 Threads perform computation optimization GPU Processing time: 29.271008 (ms), GFLOPS: 3.439010 Total Errors = 0 Loop unrolling GPU Processing time: 29.170689 (ms), GFLOPS: 3.450837 Total Errors = 0 Prefetching GPU Processing time: 29.127647 (ms), GFLOPS: 3.455936 Total Errors = 0 default view for /home4/jgalarow/openspeedshop-exercises/cuda/matrixMul/matrixmul-cuda-1.openss

[openss]: The restored experiment identifier is: -x 1 Performance data spans 3.769904 seconds from 2018/10/24 15:02:01 to 2018/10/24 15:02:05

Exclusive % of Exclusive Function (defining location) Time (ms) Total Count Exclusive
Time
17.922700 16.008430 1 matrixMul_unroll(float, float, float, int, int) (matrixmul: matrixMul_unroll.cuh,32) 17.910860 15.997854 1 matrixMul_prefetch(float, float, float, int, int) (matrixmul: matrixMul_prefetch.cuh,31) 17.873611 15.964584 1 matrixMul_compOpt(float, float, float, int, int) (matrixmul: matrixMul_compOpt.cuh,31) 14.784696 13.205587 1 matrixMul_naive(float, float, float, int, int) (matrixmul: matrixMul_naive.cuh,17) 14.515512 12.965153 1 matrixMul_tiling(float, float, float, int, int) (matrixmul: matrixMul_tiling.cuh,31) 14.485463 12.938314 1 matrixMul_coalescing(float, float, float, int, int) (matrixmul: matrixMul_coalescing.cuh,31) 14.465047 12.920078 1 matrixMul_noBankConflict(float, float, float*, int, int) (matrixmul: matrixMul_noBankConflict.cuh,32) cat attachBE_connections r313i0n0.p7.nas.nasa.gov 55856 0 0 cat cbtfAutoTopology r313i0n0:0 => r313i0n0:1;

jgalarowicz commented 6 years ago

Must have been a system glitch w.r.t. getaddrinfo. Doing another set of test runs succeeds. Closing this issue. jeg