JuliaArrays / PaddedViews.jl

Add virtual padding to the edges of an array
Other
49 stars 9 forks source link

propagate inbounds in getindex #39

Closed johnnychen94 closed 4 years ago

johnnychen94 commented 4 years ago

I'm not that confident about the benchmark result, though. Probably I'm just benchmarking the cached performance here... but I guess it still explains some differences.

julia> A = reshape(collect(1:6), 2, 3);

julia> Ap = PaddedView(0, A, (0:4, 0:4));

julia> @btime getindex($A, 1, 1);
  1.421 ns (0 allocations: 0 bytes)

julia> @btime getindex($Ap, 1, 1);
  1.975 ns (0 allocations: 0 bytes)

julia> @btime getindex($Ap, 0, 0);
  1.696 ns (0 allocations: 0 bytes)

julia> @btime @inbounds getindex($Ap, 1, 1);
  1.420 ns (0 allocations: 0 bytes) # PR
  1.696 ns (0 allocations: 0 bytes) # master

julia> @btime @inbounds getindex($Ap, 0, 0);
  1.421 ns (0 allocations: 0 bytes) # PR
  1.703 ns (0 allocations: 0 bytes) # master

Unsure the cause here, adding T notation gives a better performance for missing case:

julia> A = reshape(collect(1:6), 2, 3);

julia> Ap = PaddedView(missing, A, (0:4, 0:4));

julia> @btime getindex($Ap, 1, 1);
  2.462 ns (0 allocations: 0 bytes) # PR
  3.009 ns (0 allocations: 0 bytes) # master

julia> @btime getindex($Ap, 0, 0);
  2.196 ns (0 allocations: 0 bytes) # PR
  2.193 ns (0 allocations: 0 bytes) # master

julia> @btime @inbounds getindex($Ap, 1, 1);
  1.697 ns (0 allocations: 0 bytes) # PR
  2.193 ns (0 allocations: 0 bytes) # master

julia> @btime @inbounds getindex($Ap, 0, 0);
  1.697 ns (0 allocations: 0 bytes) # PR
  1.652 ns (0 allocations: 0 bytes) # master
timholy commented 4 years ago

If you come up with a more "inspectable" benchmark (e.g., a manually-written loop) that still shows the same weird effect, then we can analyze why it happens and if necessary report it as a Julia issue.

johnnychen94 commented 4 years ago

I'm extremely unsure of it being a benchmark noise, in my local macOS

# without T
julia> for _ in 1:10
                  t= @belapsed getindex($Ap, 1, 1);
                  println(t)
              end
2.473e-9
3.292e-9
3.294e-9
3.289e-9
3.289e-9
3.292e-9
3.298e-9
3.283e-9
3.291e-9
3.291e-9

# with T
julia> for _ in 1:10
                  t= @belapsed getindex($Ap, 1, 1);
                  println(t)
              end
2.4710000000000003e-9
2.476e-9
2.476e-9
2.4740000000000003e-9
2.473e-9
2.4710000000000003e-9
2.476e-9
2.4780000000000003e-9
2.476e-9
2.476e-9

and in my remote linux server:

julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, haswell)
Environment:
  JULIA_DEPOT_PATH = /home/jc/local/.julia
  JULIA_EDITOR = vim
  JULIA_NUM_THREADS = 4
  JULIA_PKG_SERVER = https://mirrors.lflab.cn/julia

# without
julia> for _ in 1:10
                  t= @belapsed getindex($Ap, 1, 1);
                  println(t)
              end
4.037e-9
3.672e-9
3.675e-9
3.675e-9
3.671e-9
3.672e-9
3.674e-9
3.669e-9
3.674e-9
3.907e-9

# with
julia> for _ in 1:10
                  t= @belapsed getindex($Ap, 1, 1);
                  println(t)
              end
2.7690000000000003e-9
2.762e-9
2.766e-9
2.758e-9
2.774e-9
2.76e-9
2.9409999999999997e-9
2.759e-9
2.768e-9
2.7570000000000003e-9

I've tested it on 1.0.5, 1.1.1, 1.2.0, 1.3.1, 1.4.2, 1.5.2 and 1.6.0-DEV.1177. The only version I can't see a difference is 1.0.5.

I'll try and see if I can make more reproducible results on other examples.

johnnychen94 commented 4 years ago

Still unknown to me and is not reproducible in other similar cases.

Running out of time, might just leave it as noise and not investigate anymore.

timholy commented 4 years ago

Sounds good. Perhaps I should have explained that the reason this didn't make sense to me is that a return-type annotation causes Julia's lowering to change foo(args..) to convert(T, foo(args...))::T. In cases where foo is poorly-inferred, this can be very helpful because later code can be specialized on ::T rather than having to be crafted to allow ::Any (for example, later calls can be made by compile-time dispatch rather than runtime-dispatch).

But when possible, generally a much better strategy is to make foo itself inferrable. When foo is inferred to return ::T---the same T as in the return-type annotation---then the optimization phase of the compiler should just elide the convert and type-assert. Which, of course, is the same as not having the return-type annotation at all.