JuliaMath / IntelVectorMath.jl

Julia bindings for the Intel Vector Math Library
Other
73 stars 18 forks source link

using AcuteBenchmark #30

Closed aminya closed 4 years ago

aminya commented 4 years ago

This PR changes the benchmark system to AcuteBenchmark.

Fixes https://github.com/JuliaMath/VML.jl/issues/25

IntelVectorMath/Base Performance Comparison IntelVectorMath Performance Comparison

Performance over dimensions IPerformance over dimensions

aminya commented 4 years ago

I haven't run the benchmark yet. I am trying to install MKL on my Windows machine.

I kept dimensions to 10 just for the sake of fast testing. To run the benchmark for different dimensions, we need to change dimensions to something like this:

# Dimensions
oneArgDims = [10 100 1000 10000]

twoArgDims = [10 100 1000 10000; # arg1
              10 100 1000 10000] # arg2

Could someone test the benchmark?

Crown421 commented 4 years ago

I am going to check it out soon.

However, I would very strongly prefer to keep AcuteBenchmarks out of the dependencies (Project.toml). It is not necessary to run the package itself. I personally run things on a servers with a docker image, which I try to keep as small as possible.

I would prefer a small check at the beginning of the benchmark files prompting the user to install AcuteBenchmarks if they want to benchmark on their machine and don't have it yet, but I suspect that will only be a small part of the userbase.

aminya commented 4 years ago

I am going to check it out soon.

However, I would very strongly prefer to keep AcuteBenchmarks out of the dependencies (Project.toml). It is not necessary to run the package itself. I personally run things on a servers with a docker image, which I try to keep as small as possible.

I would prefer a small check at the beginning of the benchmark files prompting the user to install AcuteBenchmarks if they want to benchmark on their machine and don't have it yet, but I suspect that will only be a small part of the userbase.

I ran the benchmark for the array size of 100. Check the result in the OP.

I didn't add AcuteBenchmark to IntelVectorMath. It is in a separate folder with its own Project.toml.

Crown421 commented 4 years ago

Ah, I apologize, I saw the email on my phone. The plots look great.

aminya commented 4 years ago

Other than Complex benchmark that needs updating Distributions (https://discourse.julialang.org/t/complex-uniform-distribution-between-1-0im-and-2-0im/32831/4), this is ready to get merged.

We can run the benchmark for more dimension sets too. I added a Github action to run the benchmark using it and create a PR. The current data is up to an array size of 100.

Crown421 commented 4 years ago

I looked at the GitHub action, and I am not fully clear on what it does. What does it generate that you generate the pull request from? Is the idea to run this action on every commit?

aminya commented 4 years ago

That is an experiment. I want to offload the benchmarking to a bot. It will make a separate branch. Then we can just decide to reference some of the results on the master.

However, setting up the environment doesn't work. I used something similar to Travis script in it. I guess the repo name should be changed.

Crown421 commented 4 years ago

Ok, sounds good. I am looking into github actions right now, and will replace Travis testing with it today.

I like that it is more granular in its results, and that you can schedule jobs, which I will use to check against the nightly build once a month.

Crown421 commented 4 years ago

I also found this: https://bheisler.github.io/post/benchmarking-in-the-cloud/ So results from GitHub Actions might have to be taken with a grain of salt.

I think in addition we can we can add a dockerfile in the benchmarking folder that people can use to run the entire thing reproducibly.

aminya commented 4 years ago

So let me remove the benchmark yaml and merge this PR. I agree with the link you sent. The result may not be reliable.

aminya commented 4 years ago

I added Complex numbers support too. IntelVectorMath Complex Performance Comparison

The result for the first few functions is surprising. I was using my laptop a little during that time. We might need to run the benchmark again for Complex numbers.

Crown421 commented 4 years ago

Once I finish updating the testing I will have a look. I can adapt my dockerfile and run the benchmark on a server based on your GitHub Actions file.

Crown421 commented 4 years ago

How would you like to proceed here? I think I can run the benchmark tomorrow on a server, and add updated images to this PR. I assume I only need to run Benchmark.jl?

Afterwards we can merge. I think the next step would then be the uncomment all the functions languishing in the package, rerun benchmarks, and keep the ones that are meaningfully faster than base?

aminya commented 4 years ago

I finally could push force by configuring SSH. I squashed the commits. 😄

This can be merged now. Later we can run the benchmark another time for the dimensions that I have added in the code.

I may add more plots to AcuteBenchmark (like GFLOPS) that I can add to this repo later.

aminya commented 4 years ago

I reran the complex benchmark. The same surprising result is acquired. I think there is some problem with using Complex functions of IVM. Probably too much overhead for Complex handling.

Anyways lets merge this

Crown421 commented 4 years ago

I think the last thing missing is a README in the benchmark folder, which we can link to in the main README. This should include all the images, and maybe a few words on how to read them.

Also, when I tinkered with benchmarking I made these plots . They are similar to your dimplots, but only show the performance ratio for functions over array size for all functions. Maybe a similar thing makes sense now/ later.

In terms of benchmarking, as long as all it takes is running the benchmark.jl I can run it overnight.

aminya commented 4 years ago

I messed up something in the squash. The images are gone

Crown421 commented 4 years ago

I still have the images locally, I can add them back in tomorrow and write the readme (essentially finally solving https://github.com/JuliaMath/VML.jl/issues/1).

aminya commented 4 years ago

I think the last thing missing is a README in the benchmark folder, which we can link to in the main README. This should include all the images, and maybe a few words on how to read them.

Also, when I tinkered with benchmarking I made these plots . They are similar to your dimplots, but only show the performance ratio for functions over array size for all functions. Maybe a similar thing makes sense now/ later.

In terms of benchmarking, as long as all it takes is running the benchmark.jl I can run it overnight.

I plan to add more plots to AcuteBenchmark. I can add this type of plot too!

I still have the images locally, I can add them back in tomorrow and write the readme (essentially finally solving #1).

I fixed it! Fortunately, I had a backup 🚀

I still have the images locally, I can add them back in tomorrow and write the readme (essentially finally solving #1).

I can do that.

aminya commented 4 years ago

Naah. It was too early to celebrate. Real bar images are gone. I will run the benchmark now for a couple of hours. Thank you!

aminya commented 4 years ago

I added the result for different dimensions up to 10000 array size. As you see Complex calculation is definitely different from Real.

Crown421 commented 4 years ago

I had a look at your plots, sorry for making more comments. Your updated complex plots look very different, and relative speed of 10^8 is almost unbelievable. Also, I think the number of images is too high, the size of the Benchmark folder is your commit is over 16mb, which I believe is too high.

I vote for 2, maybe 3, bar plots, and also 2-3 dimplots. The latter could possibly contain multiple lines.

Maybe the solution to keep this repo lightweight could be for you to have a AcuteBenchmark Results repository, where you store the results for all the things you used the package for.

aminya commented 4 years ago

I had a look at your plots, sorry for making more comments. Your updated complex plots look very different, and relative speed of 10^8 is almost unbelievable.

Why do you think this is the case? This means that base glitches for running Complex operations

Also, I think the number of images is too high, the size of the Benchmark folder is your commit is over 16mb, which I believe is too high.

I vote for 2, maybe 3, bar plots, and also 2-3 dimplots. The latter could possibly contain multiple lines.

Maybe the solution to keep this repo lightweight could be for you to have a AcuteBenchmark Results repository, where you store the results for all the things you used the package for.

I just uploaded the images for the record. For merging, I will remove them. As you said a couple of bar plots and dimplots are enough (maybe ~2-3MB).

Crown421 commented 4 years ago

I just ran the following quick test:

julia> using BenchmarkTools

julia> a = rand(100) + im*rand(100);

julia> @benchmark log.(a)
BenchmarkTools.Trial: 
  memory estimate:  1.83 KiB
  allocs estimate:  4
  --------------
  minimum time:     3.604 μs (0.00% GC)
  median time:      4.081 μs (0.00% GC)
  mean time:        4.654 μs (0.58% GC)
  maximum time:     98.969 μs (94.95% GC)
  --------------
  samples:          10000
  evals/sample:     8

julia> @benchmark VML.log(a)
BenchmarkTools.Trial: 
  memory estimate:  1.77 KiB
  allocs estimate:  1
  --------------
  minimum time:     1.407 μs (0.00% GC)
  median time:      1.564 μs (0.00% GC)
  mean time:        1.866 μs (1.15% GC)
  maximum time:     56.081 μs (96.53% GC)
  --------------
  samples:          10000
  evals/sample:     10

which suggest a performance ratio of about 2.6, whereas you bar plot in bench-dims-set4-relative.png suggest 1.5x10^8, which is an extremely large number.

Also, what do you think about having a Benchmark Results Repo? That way we don't need any images in this repo, but can link as many as we want.

aminya commented 4 years ago

Running benchmark only for size 500 gives this which is different from what we got in the previous results. I think the problem is that memory overflow thing that you had experienced before bench-dims-set1-relative

I will run the benchmark only for Complex numbers to see what happens.

aminya commented 4 years ago

I can back up a branch from this PR on my fork, then remove the images, and only put the links in the readme. Does it sound good?

Crown421 commented 4 years ago

That does sound good. To make it more 'formal' I would continue to propose a "AcuteBenchmark Results" repo, with folder for this package, and "soon" to be joined by others. In fact, I have two other packages in mind that I will look into as soon as I have time, that would also benefit from AcuteBenchmark. Each package should contain the .jl files needed to recreate the results, but not of the results themselves (possibly explicitly excluded with a .gitignore file).

In the Readme of this repo, we can link all the picture from the "Results" repo.

aminya commented 4 years ago

OK. I will create a repo like that. We can also use git submodules to actually insert the content optionally (whoever inits the submodules).

I will probably submodule the result repo to the main AcuteBenchmark repo

Crown421 commented 4 years ago

Yes, I was thinking about submodules. If you know how to set them up properly, I am all for it.

aminya commented 4 years ago

I set up the repo and submodules. I will merge this now.