JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.44k stars 5.46k forks source link

BLAS calls not assigned to callers in profiler #33605

Open atbug opened 4 years ago

atbug commented 4 years ago

I posted the issue on discourse (https://discourse.julialang.org/t/inaccurate-sampling-of-profiler/30030) but was not getting any attention. Since this is possibly a bug, I report it here with a better illustration:

using Profile, LinearAlgebra

BLAS.set_num_threads(1)

A = rand(ComplexF64, 1000, 1000)
B = rand(ComplexF64, 1000, 1000)

function foo(A, B)
    for i in 1:10
        1.0*A*B
    end
    return nothing
end

function bar(A, B)
    for i in 1:10
        A*B
    end
end

function foobar(A, B)
    @time foo(A, B)
    @time bar(A, B)
    return nothing
end

@profile foobar(A, B) # compilation

Profile.clear()

@profile foobar(A, B)

Profile.print(mincount=50)

The @time macro reports the execution time for foo and bar is

  1.996414 seconds (40 allocations: 305.177 MiB, 0.46% gc time)
  1.760597 seconds (20 allocations: 152.589 MiB, 0.57% gc time)

but the output of Profile.print is instead

177 ./task.jl:268; (::getfield(Atom, Symbol("##19#21")){Array{Any,1}})()
 177 /home/wangc/.julia/packages/Atom/X8fAI/src/comm.jl:164; handlemsg(::Dict{String,Any}, ::Dict{String,Any})
  177 ...angc/.julia/packages/Revise/439di/src/Revise.jl:1123; (::getfield(Revise, Symbol("##89#90")){getfield(Atom, Symbol("##124#129")...
   177 /home/wangc/.julia/packages/Atom/X8fAI/src/eval.jl:86; (::getfield(Atom, Symbol("##124#129")))(::Dict{String,Any})
    177 ...angc/.julia/packages/Media/ItEPc/src/dynamic.jl:24; macro expansion
     177 /home/wangc/.julia/packages/Atom/X8fAI/src/eval.jl:91; macro expansion
      177 /home/wangc/.julia/packages/Atom/X8fAI/src/repl.jl:85; hideprompt(::getfield(Atom, Symbol("##125#130")){String,Int64,String,Bool})
       177 ...e/wangc/.julia/packages/Atom/X8fAI/src/eval.jl:92; #125
        177 ./logging.jl:491; with_logger
         177 ./logging.jl:395; with_logstate(::getfield(Atom, Symbol("##126#131")){String,Int64,Stri...
          177 .../wangc/.julia/packages/Atom/X8fAI/src/eval.jl:93; #126
           177 .../wangc/.julia/packages/Atom/X8fAI/src/eval.jl:47; withpath
            177 ....julia/packages/CodeTools/xGemk/src/utils.jl:30; withpath(::getfield(Atom, Symbol("##127#132")){String,Int64,String,B...
             177 ...wangc/.julia/packages/Atom/X8fAI/src/eval.jl:94; (::getfield(Atom, Symbol("##127#132")){String,Int64,String,Bool})()
              177 ....julia/packages/CodeTools/xGemk/src/eval.jl:30; include_string(::Module, ::String, ::String, ::Int64)
               177 ./loading.jl:1064; include_string(::Module, ::String, ::String)
                115 ...c/FILES/Workspace/Hop.jl/playground/perm.jl:22; foobar(::Array{Complex{Float64},2}, ::Array{Complex{Float64},2})
                 115 ./util.jl:156; macro expansion
                  115 .../FILES/Workspace/Hop.jl/playground/perm.jl:10; foo(::Array{Complex{Float64},2}, ::Array{Complex{Float64},2})
                   115 ./operators.jl:529; *
                    57 ./arraymath.jl:52; *
                     57 ./broadcast.jl:752; broadcast(::typeof(*), ::Float64, ::Array{Complex{Float64},2})
                    58 ...a/stdlib/v1.2/LinearAlgebra/src/matmul.jl:143; *
                     58 ...a/stdlib/v1.2/LinearAlgebra/src/matmul.jl:145; mul!
                      58 .../stdlib/v1.2/LinearAlgebra/src/matmul.jl:464; gemm_wrapper!(::Array{Complex{Float64},2}, ::Char, ::Char, ::A...
                       58 ...a/stdlib/v1.2/LinearAlgebra/src/blas.jl:1131; gemm!(::Char, ::Char, ::Complex{Float64}, ::Array{Complex{Flo...
                62  ...c/FILES/Workspace/Hop.jl/playground/perm.jl:23; foobar(::Array{Complex{Float64},2}, ::Array{Complex{Float64},2})
                 60 ./util.jl:156; macro expansion
                  60 .../FILES/Workspace/Hop.jl/playground/perm.jl:17; bar
                   60 ...a/stdlib/v1.2/LinearAlgebra/src/matmul.jl:143; *
                    52 ...a/stdlib/v1.2/LinearAlgebra/src/matmul.jl:145; mul!
                     52 ...a/stdlib/v1.2/LinearAlgebra/src/matmul.jl:464; gemm_wrapper!(::Array{Complex{Float64},2}, ::Char, ::Char, ::Ar...
                      52 ...ia/stdlib/v1.2/LinearAlgebra/src/blas.jl:1131; gemm!(::Char, ::Char, ::Complex{Float64}, ::Array{Complex{Flo...

which clearly indicates that the function foo is taking roughly twice the time of bar. Therefore the two method of performance measurement is inconsistent.

julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 1

Edit: I realized I rebuilt system image and was using revise, but I can reproduce the issue above with the official binary and without revise in REPL.

vtjnash commented 4 years ago

Can you check the output with Profile.print(C=true)? Sometimes backtraces get truncated, which can lead to misleading effects in the counts.

atbug commented 4 years ago

This time, the output of @time is

  2.325471 seconds (40 allocations: 305.177 MiB, 8.40% gc time)
  1.798511 seconds (20 allocations: 152.589 MiB, 0.29% gc time)

and the Profile.print(C=true, mincount=20) gives

347  ...ldworker/worker/package_linux64/build/src/task.c:596; start_task
 347 ...dworker/worker/package_linux64/build/src/julia.h:1614; jl_apply
  347 /buildworker/worker/package_linux64/build/src/gf.c:2191; jl_apply_generic
347 unknown stackframe
    347 ./task.jl:268; (::getfield(Atom, Symbol("##19#21")){Array{Any,1}})()
     347 ...ker/worker/package_linux64/build/src/builtins.c:563; jl_f__apply
      347 ...orker/worker/package_linux64/build/src/julia.h:1614; jl_apply
       347 ...dworker/worker/package_linux64/build/src/gf.c:2191; jl_apply_generic
347 unknown stackframe
         347 .../wangc/.julia/packages/Atom/X8fAI/src/comm.jl:164; handlemsg(::Dict{String,Any}, ::Dict{String,Any})
          347 ...worker/worker/package_linux64/build/src/gf.c:2191; jl_apply_generic
           347 ...c/.julia/packages/Revise/439di/src/Revise.jl:1123; (::getfield(Revise, Symbol("##89#90")){getfield(Atom, Symbol("##124...
            347 ...wangc/.julia/packages/Atom/X8fAI/src/eval.jl:86; (::getfield(Atom, Symbol("##124#129")))(::Dict{String,Any})
             347 ...c/.julia/packages/Media/ItEPc/src/dynamic.jl:24; macro expansion
              347 ...angc/.julia/packages/Atom/X8fAI/src/eval.jl:91; macro expansion
               347 ...rker/worker/package_linux64/build/src/gf.c:2191; jl_apply_generic
                347 ...angc/.julia/packages/Atom/X8fAI/src/repl.jl:85; hideprompt(::getfield(Atom, Symbol("##125#130")){String,Int64,Stri...
                 347 ...ngc/.julia/packages/Atom/X8fAI/src/eval.jl:92; #125
                  347 ./logging.jl:491; with_logger
                   347 ./logging.jl:395; with_logstate(::getfield(Atom, Symbol("##126#131")){String,Int6...
                    347 ...gc/.julia/packages/Atom/X8fAI/src/eval.jl:93; #126
                     347 ...gc/.julia/packages/Atom/X8fAI/src/eval.jl:47; withpath
                      347 ...ia/packages/CodeTools/xGemk/src/utils.jl:30; withpath(::getfield(Atom, Symbol("##127#132")){String,Int64,St...
                       347 ...c/.julia/packages/Atom/X8fAI/src/eval.jl:94; (::getfield(Atom, Symbol("##127#132")){String,Int64,String,Bo...
                        347 ...r/worker/package_linux64/build/src/gf.c:2191; jl_apply_generic
347 unknown stackframe
                          347 ...ia/packages/CodeTools/xGemk/src/eval.jl:30; include_string(::Module, ::String, ::String, ::Int64)
                           347 ./loading.jl:1064; include_string(::Module, ::String, ::String)
                            347 ...orker/package_linux64/build/src/ast.c:873; jl_parse_eval_all
                             347 .../package_linux64/build/src/toplevel.c:815; jl_toplevel_eval_flex
                              347 ...ckage_linux64/build/src/interpreter.c:893; jl_interpret_toplevel_thunk
347 unknown stackframe
347 unknown stackframe
347 unknown stackframe
                                  347 ...age_linux64/build/src/interpreter.c:884; jl_interpret_toplevel_thunk_callback
                                   347 ...age_linux64/build/src/interpreter.c:699; eval_body
                                    347 ...ge_linux64/build/src/interpreter.c:635; eval_body
                                     347 ...ge_linux64/build/src/interpreter.c:411; eval_value
                                      347 ...e_linux64/build/src/interpreter.c:323; do_call
                                       347 ...er/package_linux64/build/src/gf.c:2191; jl_apply_generic
                                        285 ...rkspace/Hop.jl/playground/perm.jl:22; foobar(::Array{Complex{Float64},2}, ::Array{Complex...
                                         285 ./util.jl:156; macro expansion
                                          285 ...kspace/Hop.jl/playground/perm.jl:10; foo(::Array{Complex{Float64},2}, ::Array{Complex{F...
                                           285 ./operators.jl:529; *
                                            125 ./arraymath.jl:52; *
                                             125 ./broadcast.jl:752; broadcast(::typeof(*), ::Float64, ::Array{Comple...
                                              96 ./boot.jl:406; materialize
                                               96 ...kage_linux64/build/src/array.c:427; jl_alloc_array_2d
                                                96 ...kage_linux64/build/src/array.c:160; _new_array
                                                 96 ...age_linux64/build/src/array.c:112; _new_array_
                                                  96 ...64/build/src/julia_internal.h:274; jl_gc_alloc_
                                                   96 ...ckage_linux64/build/src/gc.c:1111; jl_gc_pool_alloc
                                                    96 ...ckage_linux64/build/src/gc.c:2912; jl_gc_collect
                                                     80 ...kage_linux64/build/src/gc.c:2729; _jl_gc_collect
                                                      39 ...kage_linux64/build/src/gc.c:1999; gc_mark_loop
                                                       30 ...kage_linux64/build/src/gc.c:1732; gc_mark_scan_obj8
                                              29 ./broadcast.jl:798; materialize
                                               29 ./broadcast.jl:818; copy
                                                29 ./broadcast.jl:842; copyto!
                                                 29 ./broadcast.jl:887; copyto!
                                                  25 ./simdloop.jl:77; macro expansion
                                                   25 ./broadcast.jl:888; macro expansion
                                                    20 ./broadcast.jl:558; getindex
                                                     20 ./broadcast.jl:598; _broadcast_getindex
                                                      20 ./broadcast.jl:625; _broadcast_getindex_evalf
                                                       20 ./complex.jl:312; *
                                                        20 ./float.jl:399; *
                                            100 ./boot.jl:406; *
                                             100 ...ckage_linux64/build/src/array.c:427; jl_alloc_array_2d
                                              100 ...kage_linux64/build/src/array.c:160; _new_array
                                               100 ...kage_linux64/build/src/array.c:112; _new_array_
                                                100 ...64/build/src/julia_internal.h:274; jl_gc_alloc_
                                                 100 ...ackage_linux64/build/src/gc.c:1111; jl_gc_pool_alloc
                                                  100 ...ckage_linux64/build/src/gc.c:2912; jl_gc_collect
                                                   68 ...ckage_linux64/build/src/gc.c:2729; _jl_gc_collect
                                                    26 ...ckage_linux64/build/src/gc.c:1999; gc_mark_loop
                                                     22 ...kage_linux64/build/src/gc.c:1732; gc_mark_scan_obj8
                                            60  ...1.2/LinearAlgebra/src/matmul.jl:143; *
                                             60 ...1.2/LinearAlgebra/src/matmul.jl:145; mul!
                                              60 ...1.2/LinearAlgebra/src/matmul.jl:464; gemm_wrapper!(::Array{Complex{Float64},2}, ::Cha...
                                               60 ...v1.2/LinearAlgebra/src/blas.jl:1131; gemm!(::Char, ::Char, ::Complex{Float64}, ::Ar...
                                                60 .../../lib/julia/libopenblas64_.so:?; zgemm_64_
                                                 60 ...../lib/julia/libopenblas64_.so:?; zgemm_nn
                                                  21 ...../lib/julia/libopenblas64_.so:?; zgemm_itcopy_HASWELL
                                                  39 ...../lib/julia/libopenblas64_.so:?; zgemm_oncopy_HASWELL
                                        62  ...rkspace/Hop.jl/playground/perm.jl:23; foobar(::Array{Complex{Float64},2}, ::Array{Complex...
                                         62 ./util.jl:156; macro expansion
                                          62 ...rkspace/Hop.jl/playground/perm.jl:17; bar
                                           62 ...v1.2/LinearAlgebra/src/matmul.jl:143; *
                                            57 ...1.2/LinearAlgebra/src/matmul.jl:145; mul!
                                             57 ...1.2/LinearAlgebra/src/matmul.jl:464; gemm_wrapper!(::Array{Complex{Float64},2}, ::Cha...
                                              57 ...v1.2/LinearAlgebra/src/blas.jl:1131; gemm!(::Char, ::Char, ::Complex{Float64}, ::Arr...
                                               57 .../../lib/julia/libopenblas64_.so:?; zgemm_64_
                                                57 .../../lib/julia/libopenblas64_.so:?; zgemm_nn
                                                 38 ...../lib/julia/libopenblas64_.so:?; zgemm_oncopy_HASWELL
57   ...s/julia-1.2.0/bin/../lib/julia/libopenblas64_.so:?; zgemm_beta_HASWELL
3720 ...s/julia-1.2.0/bin/../lib/julia/libopenblas64_.so:?; zgemm_kernel_n_HASWELL

A little bit messy. foo takes 285 and bar takes 62, if read this thing right.

vtjnash commented 4 years ago

Actually, that last lines says they took almost 2k (about 2 seconds), but had slightly different setup times (approximately 250 ms vs 50 ms)

atbug commented 4 years ago

Actually, that last lines says they took almost 2k (about 2 seconds), but had slightly different setup times (approximately 250 ms vs 50 ms)

Sorry, I don't understand how you get this number, the last line is 3720. Did you divide by two assuming both functions take roughly the same time in blas?

vtjnash commented 4 years ago

yes

atbug commented 4 years ago

Is there any workaround for this? It is not really possible to manually assign these BLAS calls to different functions in a complex program.

Keno commented 4 years ago

OoenBlas doesn't provide correct unwind information for their assembly kernels. It's something that could be fixed, but nobody has tried yet.