Closed aminya closed 4 years ago
I haven't run the benchmark yet. I am trying to install MKL on my Windows machine.
I kept dimensions to 10 just for the sake of fast testing. To run the benchmark for different dimensions, we need to change dimensions to something like this:
# Dimensions
oneArgDims = [10 100 1000 10000]
twoArgDims = [10 100 1000 10000; # arg1
10 100 1000 10000] # arg2
Could someone test the benchmark?
I am going to check it out soon.
However, I would very strongly prefer to keep AcuteBenchmarks out of the dependencies (Project.toml). It is not necessary to run the package itself. I personally run things on a servers with a docker image, which I try to keep as small as possible.
I would prefer a small check at the beginning of the benchmark files prompting the user to install AcuteBenchmarks if they want to benchmark on their machine and don't have it yet, but I suspect that will only be a small part of the userbase.
I am going to check it out soon.
However, I would very strongly prefer to keep AcuteBenchmarks out of the dependencies (Project.toml). It is not necessary to run the package itself. I personally run things on a servers with a docker image, which I try to keep as small as possible.
I would prefer a small check at the beginning of the benchmark files prompting the user to install AcuteBenchmarks if they want to benchmark on their machine and don't have it yet, but I suspect that will only be a small part of the userbase.
I ran the benchmark for the array size of 100. Check the result in the OP.
I didn't add AcuteBenchmark to IntelVectorMath. It is in a separate folder with its own Project.toml.
Ah, I apologize, I saw the email on my phone. The plots look great.
Other than Complex benchmark that needs updating Distributions (https://discourse.julialang.org/t/complex-uniform-distribution-between-1-0im-and-2-0im/32831/4), this is ready to get merged.
We can run the benchmark for more dimension sets too. I added a Github action to run the benchmark using it and create a PR. The current data is up to an array size of 100.
I looked at the GitHub action, and I am not fully clear on what it does. What does it generate that you generate the pull request from? Is the idea to run this action on every commit?
That is an experiment. I want to offload the benchmarking to a bot. It will make a separate branch. Then we can just decide to reference some of the results on the master.
However, setting up the environment doesn't work. I used something similar to Travis script in it. I guess the repo name should be changed.
Ok, sounds good. I am looking into github actions right now, and will replace Travis testing with it today.
I like that it is more granular in its results, and that you can schedule jobs, which I will use to check against the nightly build once a month.
I also found this: https://bheisler.github.io/post/benchmarking-in-the-cloud/ So results from GitHub Actions might have to be taken with a grain of salt.
I think in addition we can we can add a dockerfile in the benchmarking folder that people can use to run the entire thing reproducibly.
So let me remove the benchmark yaml and merge this PR. I agree with the link you sent. The result may not be reliable.
I added Complex numbers support too.
The result for the first few functions is surprising. I was using my laptop a little during that time. We might need to run the benchmark again for Complex numbers.
Once I finish updating the testing I will have a look. I can adapt my dockerfile and run the benchmark on a server based on your GitHub Actions file.
How would you like to proceed here? I think I can run the benchmark tomorrow on a server, and add updated images to this PR. I assume I only need to run Benchmark.jl?
Afterwards we can merge. I think the next step would then be the uncomment all the functions languishing in the package, rerun benchmarks, and keep the ones that are meaningfully faster than base?
I finally could push force by configuring SSH. I squashed the commits. 😄
This can be merged now. Later we can run the benchmark another time for the dimensions that I have added in the code.
I may add more plots to AcuteBenchmark (like GFLOPS) that I can add to this repo later.
I reran the complex benchmark. The same surprising result is acquired. I think there is some problem with using Complex functions of IVM. Probably too much overhead for Complex handling.
Anyways lets merge this
I think the last thing missing is a README in the benchmark folder, which we can link to in the main README. This should include all the images, and maybe a few words on how to read them.
Also, when I tinkered with benchmarking I made these plots . They are similar to your dimplots, but only show the performance ratio for functions over array size for all functions. Maybe a similar thing makes sense now/ later.
In terms of benchmarking, as long as all it takes is running the benchmark.jl
I can run it overnight.
I messed up something in the squash. The images are gone
I still have the images locally, I can add them back in tomorrow and write the readme (essentially finally solving https://github.com/JuliaMath/VML.jl/issues/1).
I think the last thing missing is a README in the benchmark folder, which we can link to in the main README. This should include all the images, and maybe a few words on how to read them.
Also, when I tinkered with benchmarking I made these plots . They are similar to your dimplots, but only show the performance ratio for functions over array size for all functions. Maybe a similar thing makes sense now/ later.
In terms of benchmarking, as long as all it takes is running the
benchmark.jl
I can run it overnight.
I plan to add more plots to AcuteBenchmark. I can add this type of plot too!
I still have the images locally, I can add them back in tomorrow and write the readme (essentially finally solving #1).
I fixed it! Fortunately, I had a backup 🚀
I still have the images locally, I can add them back in tomorrow and write the readme (essentially finally solving #1).
I can do that.
Naah. It was too early to celebrate. Real bar images are gone. I will run the benchmark now for a couple of hours. Thank you!
I added the result for different dimensions up to 10000 array size. As you see Complex calculation is definitely different from Real.
I had a look at your plots, sorry for making more comments. Your updated complex plots look very different, and relative speed of 10^8 is almost unbelievable. Also, I think the number of images is too high, the size of the Benchmark folder is your commit is over 16mb, which I believe is too high.
I vote for 2, maybe 3, bar plots, and also 2-3 dimplots. The latter could possibly contain multiple lines.
Maybe the solution to keep this repo lightweight could be for you to have a AcuteBenchmark Results
repository, where you store the results for all the things you used the package for.
I had a look at your plots, sorry for making more comments. Your updated complex plots look very different, and relative speed of 10^8 is almost unbelievable.
Why do you think this is the case? This means that base glitches for running Complex operations
Also, I think the number of images is too high, the size of the Benchmark folder is your commit is over 16mb, which I believe is too high.
I vote for 2, maybe 3, bar plots, and also 2-3 dimplots. The latter could possibly contain multiple lines.
Maybe the solution to keep this repo lightweight could be for you to have a
AcuteBenchmark Results
repository, where you store the results for all the things you used the package for.
I just uploaded the images for the record. For merging, I will remove them. As you said a couple of bar plots and dimplots are enough (maybe ~2-3MB).
I just ran the following quick test:
julia> using BenchmarkTools
julia> a = rand(100) + im*rand(100);
julia> @benchmark log.(a)
BenchmarkTools.Trial:
memory estimate: 1.83 KiB
allocs estimate: 4
--------------
minimum time: 3.604 μs (0.00% GC)
median time: 4.081 μs (0.00% GC)
mean time: 4.654 μs (0.58% GC)
maximum time: 98.969 μs (94.95% GC)
--------------
samples: 10000
evals/sample: 8
julia> @benchmark VML.log(a)
BenchmarkTools.Trial:
memory estimate: 1.77 KiB
allocs estimate: 1
--------------
minimum time: 1.407 μs (0.00% GC)
median time: 1.564 μs (0.00% GC)
mean time: 1.866 μs (1.15% GC)
maximum time: 56.081 μs (96.53% GC)
--------------
samples: 10000
evals/sample: 10
which suggest a performance ratio of about 2.6, whereas you bar plot in bench-dims-set4-relative.png
suggest 1.5x10^8, which is an extremely large number.
Also, what do you think about having a Benchmark Results Repo? That way we don't need any images in this repo, but can link as many as we want.
Running benchmark only for size 500 gives this which is different from what we got in the previous results. I think the problem is that memory overflow thing that you had experienced before
I will run the benchmark only for Complex numbers to see what happens.
I can back up a branch from this PR on my fork, then remove the images, and only put the links in the readme. Does it sound good?
That does sound good.
To make it more 'formal' I would continue to propose a "AcuteBenchmark Results" repo, with folder for this package, and "soon" to be joined by others. In fact, I have two other packages in mind that I will look into as soon as I have time, that would also benefit from AcuteBenchmark
.
Each package should contain the .jl
files needed to recreate the results, but not of the results themselves (possibly explicitly excluded with a .gitignore
file).
In the Readme of this repo, we can link all the picture from the "Results" repo.
OK. I will create a repo like that. We can also use git submodules
to actually insert the content optionally (whoever inits the submodules).
I will probably submodule the result repo to the main AcuteBenchmark repo
Yes, I was thinking about submodules
. If you know how to set them up properly, I am all for it.
I set up the repo and submodules. I will merge this now.
This PR changes the benchmark system to AcuteBenchmark.
Fixes https://github.com/JuliaMath/VML.jl/issues/25
IntelVectorMath/Base Performance Comparison
Performance over dimensions