JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.85k stars 5.49k forks source link

Tsetlin.jl performance benchmark degrades on Julia 1.11 RC3 #55702

Open BooBSD opened 2 months ago

BooBSD commented 2 months ago

Hello,

I recently discovered that the MNIST inference benchmark performance for Tsetlin.jl degrades by approximately 15-16% on Julia 1.11 RC3 compared to version 1.10.5.

Julia 1.10.5:

julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen 9 7950X3D 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
boo@rig:/tmp/Tsetlin.jl/examples$ /home/boo/julia-1.10.5/bin/julia --project=. -O3 -t 32 mnist_benchmark_inference.jl
Loading model from ./models/tm_optimized_72.tm... Done.

CPU: AMD Ryzen 9 7950X3D 16-Core Processor
Running in 32 threads.
Preparing input data for benchmark... Done. Elapsed 18.243 seconds.
Warm-up started... Done. Elapsed 2.109 seconds.
Benchmark for TMClassifierCompiled model in batch mode (batch size = 64) started... Done.
64000000 predictions processed in 1.250 seconds.
Performance: 51194475 predictions per second.
Throughput: 28.089 GB/s.
Input data size: 35.115 GB.
Parameters during training: 3386880.
Parameters after training and compilation: 10499.
Accuracy: 98.10%.

Julia 1.11 RC3:

julia> versioninfo()
Julia Version 1.11.0-rc3
Commit 616e45539db (2024-08-26 15:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen 9 7950X3D 16-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
boo@rig:/tmp/Tsetlin.jl/examples$ /home/boo/julia-1.11.0-rc3/bin/julia --project=. -O3 -t 32 mnist_benchmark_inference.jl
Loading model from ./models/tm_optimized_72.tm... Done.

CPU: AMD Ryzen 9 7950X3D 16-Core Processor
Running in 32 threads.
Preparing input data for benchmark... Done. Elapsed 20.227 seconds.
Warm-up started... Done. Elapsed 1.721 seconds.
Benchmark for TMClassifierCompiled model in batch mode (batch size = 64) started... Done.
64000000 predictions processed in 1.447 seconds.
Performance: 44227347 predictions per second.
Throughput: 24.256 GB/s.
Input data size: 35.100 GB.
Parameters during training: 3386880.
Parameters after training and compilation: 10499.
Accuracy: 98.10%.

As you can see, I achieved 51 million predictions per second compared to 44 million predictions per second. Why is Julia 1.11 significantly slower than 1.10? How can I fix it?

giordano commented 2 months ago

You want to profile your code and identify what part of the code got slower. Not infrequently, changes in performance are due using a different LLVM version, but hard to say whether that's the culprit here without a profile. Alternatively, you can try to git bisect where your code got sensibly slower (this can take a while but can probably better identify the culprit).

oscardssmith commented 2 months ago

my guess is that this will be the same as https://github.com/JuliaLang/julia/issues/55009

BooBSD commented 2 months ago

It looks like the issue is related to the new Memory type and getindex / setindex. After optimizing my code, I achieved 77-78 million MNIST predictions per second with Julia 1.10.5 compared to 74-76 million predictions with Julia 1.1 RC3. The difference is now 4%.

oscardssmith commented 2 months ago

Do you know where in the code the regression is? It's a lot easier to see if there's something to fix with a relatively small MWE.

giordano commented 2 months ago

Also #55009 is about getindex: https://github.com/JuliaLang/julia/issues/55009#issuecomment-2206637798