Closed dionhaefner closed 5 years ago
With the recent master, I get something like this:
$ BH_OPENMP_PROF=1 BH_OPENMP_VERBOSE=1 python bh_test.py
Write source "/var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_291c74f0ebb735a3.c"
compile command: "/usr/local/opt/llvm/bin/clang -x c -dynamiclib -arch x86_64 -Wno-expansion-to-defined -O3 -Wno-pass-failed -march=native -Werror -fopenmp=libomp -I/Users/dion/.local/share/bohrium/include /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_291c74f0ebb735a3.c -o /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/obj/94091604c79445bc_291c74f0ebb735a3.so"
Write source "/var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_2a1e823edf5c7601.c"
compile command: "/usr/local/opt/llvm/bin/clang -x c -dynamiclib -arch x86_64 -Wno-expansion-to-defined -O3 -Wno-pass-failed -march=native -Werror -fopenmp=libomp -I/Users/dion/.local/share/bohrium/include /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_2a1e823edf5c7601.c -o /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/obj/94091604c79445bc_2a1e823edf5c7601.so"
Write source "/var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_74c00300d3e68155.c"
compile command: "/usr/local/opt/llvm/bin/clang -x c -dynamiclib -arch x86_64 -Wno-expansion-to-defined -O3 -Wno-pass-failed -march=native -Werror -fopenmp=libomp -I/Users/dion/.local/share/bohrium/include /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_74c00300d3e68155.c -o /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/obj/94091604c79445bc_74c00300d3e68155.so"
Write source "/var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_de8967f5baab9a56.c"
compile command: "/usr/local/opt/llvm/bin/clang -x c -dynamiclib -arch x86_64 -Wno-expansion-to-defined -O3 -Wno-pass-failed -march=native -Werror -fopenmp=libomp -I/Users/dion/.local/share/bohrium/include /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_de8967f5baab9a56.c -o /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/obj/94091604c79445bc_de8967f5baab9a56.so"
Write source "/var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_437f5925862da730.c"
compile command: "/usr/local/opt/llvm/bin/clang -x c -dynamiclib -arch x86_64 -Wno-expansion-to-defined -O3 -Wno-pass-failed -march=native -Werror -fopenmp=libomp -I/Users/dion/.local/share/bohrium/include /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_437f5925862da730.c -o /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/obj/94091604c79445bc_437f5925862da730.so"
Write source "/var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_5eb4209930c8cdeb.c"
compile command: "/usr/local/opt/llvm/bin/clang -x c -dynamiclib -arch x86_64 -Wno-expansion-to-defined -O3 -Wno-pass-failed -march=native -Werror -fopenmp=libomp -I/Users/dion/.local/share/bohrium/include /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_5eb4209930c8cdeb.c -o /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/obj/94091604c79445bc_5eb4209930c8cdeb.so"
Write source "/var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_ec89bb1025fbda.c"
compile command: "/usr/local/opt/llvm/bin/clang -x c -dynamiclib -arch x86_64 -Wno-expansion-to-defined -O3 -Wno-pass-failed -march=native -Werror -fopenmp=libomp -I/Users/dion/.local/share/bohrium/include /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src/94091604c79445bc_ec89bb1025fbda.c -o /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/obj/94091604c79445bc_ec89bb1025fbda.so"
Press Enter to quit
[OpenMP] Profiling:
Fuse cache hits: 202/207 (97.5845%)
Codegen cache hits: 0/6 (0%)
Compilation cache hits: 100/107 (93.4579%)
Array contractions: 17/37 (45.9459%)
Outer-fusion ratio: 11/47 (23.4043%)
Malloc cache hits: 100/112 (89.2857%)
Max memory usage: 548 MB
Syncs to NumPy: 3
Total Work: 147938400 operations
Throughput: 4.22934e+07ops
Work below par-threshold (1000): 0%
Wall clock: 3.49791s
Total Execution: 1.43557s
Pre-fusion: 0.000214091s
Fusion: 0.000527781s
Codegen: 0.000578563s
Compilation: 1.37932s
Exec: 1.58708s
Copy2dev: 0s
Copy2host: 0s
Offload: 0s
Other: -1.53215s
Ext-method: 0.0248146s
Unaccounted for (wall - total): 2.06233s
Per-kernel Profiling:
Kernel filename Calls Total time Max time Min time
94091604c79445bc_4d50d89dddbed345.c 101 1.46e+00s 1.89e-02s 1.36e-02s
94091604c79445bc_291c74f0ebb735a3.c 1 5.27e-02s 5.27e-02s 5.27e-02s
94091604c79445bc_2a1e823edf5c7601.c 1 1.93e-02s 1.93e-02s 1.93e-02s
94091604c79445bc_74c00300d3e68155.c 1 1.96e-02s 1.96e-02s 1.96e-02s
94091604c79445bc_5eb4209930c8cdeb.c 1 1.40e-02s 1.40e-02s 1.40e-02s
94091604c79445bc_437f5925862da730.c 1 1.17e-02s 1.17e-02s 1.17e-02s
94091604c79445bc_ec89bb1025fbda.c 1 1.14e-02s 1.14e-02s 1.14e-02s
Observations:
The user kernel source isn't written to the same cache directory as the other kernels:
$ ls /var/folders/ld/ylk19q555951g5k6t0vhz6740000gp/T/bh_ede7/src
94091604c79445bc_291c74f0ebb735a3.c 94091604c79445bc_74c00300d3e68155.c
94091604c79445bc_2a1e823edf5c7601.c 94091604c79445bc_de8967f5baab9a56.c
94091604c79445bc_437f5925862da730.c 94091604c79445bc_ec89bb1025fbda.c
94091604c79445bc_5eb4209930c8cdeb.c
Only outstanding issue is the incorrect "Total Execution" counter. That is because calling a user kernel bypasses EngineCPU::handleExecution
which increments the total time counter. I didn't feel comfortable enough to propose a different architecture though.
Fixed by #591
Running the script from #572 with
BH_OPENMP_PROF=true
gives something likeMost time is spent inside the user kernel, but it only shows up as "unaccounted for".