RelationalAI-oss / MultithreadingBenchmarks.jl

MIT License
5 stars 0 forks source link

Reenable all-threads compiling benchmark, now that the bug is fixed. #8

Closed NHDaly closed 2 years ago

NHDaly commented 2 years ago

Reenable the benchmark now that https://github.com/JuliaLang/julia/issues/33183 is fixed. 😊 Haha it's been fixed since like julia 1.4 or 1.5 or something, but i haven't looked at these benchmarks in a long time! thanks for running them again, @d-netto!

Can you review this PR? I can't add you as a reviewer yet because you haven't contributed.

Also, CC: @kpamnany - can you review as well?

NHDaly commented 2 years ago

Here is the result from running the benchmark on my machine:

------ JULIA VERSIONINFO ------
Julia Version 1.7.3-pre.8
Commit a690e381c0* (2022-04-05 13:49 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_LOAD_PATH = @:/var/folders/fd/hymwsmhj1zd27lwlftv_yb2r0000gn/T/jl_M3CJZF
------ JULIA CPU INFO ------
Sys.CPU_THREADS = 12
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz:
          speed         user         nice          sys         idle          irq
#1-12  2900 MHz    2997967 s          0 s    1241982 s   33823807 s          0 s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running benchmark for /Users/nathandaly/.julia/dev/MultithreadingBenchmarks/src/../bench/all_tasks_compiling.jl
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
------ TEST PARAMETERS ------
nthreads_to_test = [1, 3, 6, 9, 12]
nqueries = 5
num_ops = 10
warmup
NUM_QUERIES: 1
WORK_SIZE: 10
5 threads:
0.019208049 secs
6 threads:
0.019770852 secs
Results (omitted printing latencies column):
2Γ—6 DataFrame
 Row β”‚ nthreads  time_secs  allocs  memory  gctime_secs  thread_counts
     β”‚ Int64     Float64    Int64   Int64   Float64      Vector{Int64}
─────┼──────────────────────────────────────────────────────────────────────
   1 β”‚        5  0.019208     8644  540364          0.0  [1, 0, 0, 0, 0]
   2 β”‚        6  0.0197709    8644  540396          0.0  [0, 1, 0, 0, 0, 0]
Processed:
2Γ—6 DataFrame
 Row β”‚ nthreads  time_secs  speedup_factor  marginal_speedup  utilization  latency_quantiles_ms
     β”‚ Int64     Float64    Float64         Float64           Float64      Tuple…
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚        5  0.019208         1.0              1.0           0.2       (0.1=>23.0107, 0.5=>23.0107, 0.9…
   2 β”‚        6  0.0197709        0.971534        -0.0284663     0.161922  (0.1=>26.9172, 0.5=>26.9172, 0.9…
run benchmark
NUM_QUERIES: 5
WORK_SIZE: 10
1 threads:
0.111896811 secs
3 threads:
0.09173288 secs
6 threads:
0.098779298 secs
9 threads:
0.098101715 secs
12 threads:
0.097688362 secs
Results (omitted printing latencies column):
5Γ—6 DataFrame
 Row β”‚ nthreads  time_secs  allocs  memory   gctime_secs  thread_counts
     β”‚ Int64     Float64    Int64   Int64    Float64      Vector{Int64}
─────┼──────────────────────────────────────────────────────────────────────────────────────
   1 β”‚        1  0.111897    43147  2698108          0.0  [5]
   2 β”‚        3  0.0917329   43152  2698300          0.0  [2, 1, 2]
   3 β”‚        6  0.0987793   43152  2698364          0.0  [1, 1, 1, 1, 1, 0]
   4 β”‚        9  0.0981017   43152  2698396          0.0  [1, 1, 1, 1, 1, 0, 0, 0, 0]
   5 β”‚       12  0.0976884   43152  2698460          0.0  [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0…
Processed:
5Γ—6 DataFrame
 Row β”‚ nthreads  time_secs  speedup_factor  marginal_speedup  utilization  latency_quantiles_ms
     β”‚ Int64     Float64    Float64         Float64           Float64      Tuple…
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚        1  0.111897          1.0            1.0           1.0        (0.1=>29.9244, 0.5=>31.3406, 0.9…
   2 β”‚        3  0.0917329         1.21981        0.109906      0.406604   (0.1=>44.1762, 0.5=>45.2292, 0.9…
   3 β”‚        6  0.0987793         1.1328        -0.0290051     0.188799   (0.1=>83.2134, 0.5=>92.4317, 0.9…
   4 β”‚        9  0.0981017         1.14062        0.00260805    0.126736   (0.1=>84.305, 0.5=>94.1237, 0.9=…
   5 β”‚       12  0.0976884         1.14545        0.00160879    0.0954539  (0.1=>83.5484, 0.5=>93.8798, 0.9…
plot results
-- saving figures in /Users/nathandaly/.julia/dev/MultithreadingBenchmarks/test --
     Testing MultithreadingBenchmarks tests passed

As expected, the time remains roughly constant with increasing number of threads, since julia currently maintains a global lock around compilation. πŸ‘

kpamnany commented 2 years ago

Looks fine to me, and it's a useful test for us. πŸ‘

kpamnany commented 2 years ago

BTW, may want to clean up this.

Also BTW, it'd be interesting to see what happens to these compiling tasks when GC is triggered...

NHDaly commented 2 years ago

agreed on both counts! but i don't really have the bandwidth now to maintain this package any more than this. :'(