how to reproduce performance graphs?

loveshack commented 5 years ago

The performance documentation should say what to run to reproduce or extend the results. It's not obvious what that is.

Also, they don't report page sizes. Isn't that relevant for large sizes (TLB pressure)?

fgvanzee commented 5 years ago

@loveshack I've added the directory containing the source code for the performance drivers, but I'm wary of trying to put together a step-by-step, fool-proof guide for how to reproduce the graphs. Not sure which you had in mind.

As for the page size, it's 4096 bytes on most/all of the machines I've reported on. I've added that information to the documents, too.

loveshack commented 5 years ago

@loveshack I can add the directory containing the source code for the performance drivers, but I'm wary of trying to put together a step-by-step, fool-proof guide for how to reproduce the graphs. Not sure which you had in mind.

At least information about what to build, and possibly how to run the examples (similarly to openblas).

As for the page size, it's 4096 bytes on most/all of the machines I've reported on. I'm happy to add that information to the documents.

With the default transparent huge pages turned off? (I'm happy to believe it doesn't make a difference, but the TLB is relevant, isn't it?)

I also wonder about the variance of repetitions, particularly if core binding wasn't done.

fgvanzee commented 5 years ago

At least information about what to build, and possibly how to run the examples (similarly to openblas).

I suppose we stop just short of that. I do provide all of the information (reasonably) needed to approximate (if not reproduce) the performance experiments, especially on the software side. And I now explicitly state the directory with the source code. At risk of sounding pretentious, if you can't figure out how to run the experiments from this information, and from studying the Makefile and runme.sh script (and matlab/octave code) in the source directory, then you probably don't have the background necessary to run the drivers in the first place.

Reproducing the graphs requires not just configuring, compiling, linking against BLIS correctly, and correctly running the resultant binaries, but also doing so for MKL, OpenBLAS, Eigen, and any other implementations the graphs may happen to compare against at the moment. (Don't even get me started on how much time and effort it took to properly build Eigen.) This is a non-trivial task. I sincerely wish it were easier. (If an entity with deep pockets wanted a step-by-step benchmarking guide badly enough, I think we would be happy to invest the time into that guide in exchange for funding.)

With the default transparent huge pages turned off? (I'm happy to believe it doesn't make a difference, but the TLB is relevant, isn't it?)

I'm not familiar with that setting / kernel parameter / terminology. I merely check the PAGE_SIZE returned via getconf -a. If there is more that I can do to check / characterize page sizes, please let me know (and how).

jeffhammond commented 5 years ago

I studied page size sensitivity of BLIS in extraordinary detail last fall and the answer is that it is not sensitive to it, at least on an architecture where MKL was sensitive to page size (because it’s closer to peak than BLIS). -- Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/

loveshack commented 5 years ago

I suppose we stop just short of that. I do provide all of the information (reasonably) needed to approximate (if not reproduce) the performance experiments, especially on the software side. And I now explicitly state the directory with the source code. At risk of sounding pretentious, if you can't figure out how to run the experiments from this information, and from studying the Makefile and runme.sh script (and matlab/octave code) in the source directory, then you probably don't have the background necessary to run the drivers in the first place.

That's the sort of info I was missing -- which is fairly clear for OpenBLAS. I don't mean to be stroppy, but I wanted to know how to make more measurements (and, for instance, I could probably access Thunder X2 if you no longer can).

With the default transparent huge pages turned off? (I'm happy to believe it doesn't make a difference, but the TLB is relevant, isn't it?)

I'm not familiar with that setting / kernel parameter / terminology. I merely check the PAGE_SIZE returned via getconf -a. If there is more that I can do to check / characterize page sizes, please let me know (and how).

From what Jeff said, it's not relevant, but see vm/transhuge.txt in the Linux documentation, or maybe https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge It's typically on by default. (However, reclamation was problematic -- at least at one time -- and HPC people have tended to turn it off entirely, probably for no good reason now.)

loveshack commented 5 years ago

I studied page size sensitivity of BLIS in extraordinary detail last fall and the answer is that it is not sensitive to it, at least on an architecture where MKL was sensitive to page size (because it’s closer to peak than BLIS).

I think that's worth documenting, particularly as Goto was always supposed to be concerned with TLB misses. I may be able to check on non-x86_64 if that might be different.

fgvanzee commented 5 years ago

That's the sort of info I was missing -- which is fairly clear for OpenBLAS. I don't mean to be stroppy, but I wanted to know how to make more measurements (and, for instance, I could probably access Thunder X2 if you no longer can).

@loveshack Just to clarify, we're happy to assist experts such as yourself when benchmarking / reproducing our performance graphs. (I think I misunderstood originally what you had in mind; I thought you wanted me to put something together so that Joe Rando could produce graphs. I didn't realize you were looking for specific one-on-one help.)

Now that you know the directory with the source code, you probably have a lead on how to get started. I recommend studying the Makefile (paying attention to the supported make targets and the libraries, such as OpenBLAS, BLASFEO, MKL, etc., that are assumed to have been built and installed, and their locations). Then play around with running one executable by hand. Then take a look at the runme.sh script when you're ready to run them en masse. Finally, the octave directory contains Matlab/Octave code that can be used to turn a collection of output files into a single PDF of graphs. (Also important: the runme.m script in the octave directory, despite its name, should not be run. It's more of a scratchpad that I use to copy and paste invocations of the plot_panel_trxsh() graphing function from. Definitely run these by hand, since things can easily go wrong.)

loveshack commented 5 years ago

@loveshack Just to clarify, we're happy to assist experts such as yourself when benchmarking / reproducing our performance graphs. (I think I misunderstood originally what you had in mind; I thought you wanted me to put something together so that Joe Rando could produce graphs. I didn't realize you were looking for specific one-on-one help.)

I'm not. I mainly wanted to know what the actual code to build and run is, which is obvious in openblas, for instance. I'd expect someone with a clue or two to be able to go from there. Research Software Engineer lore would expect some sort of R or Python runnable notebook thing (in a container?), but some of us experimentalists have a different view of replicating results.

Now that you know the directory with the source code, you probably have a lead on how to get started.

Sure, thanks. It does take a fairly close look to spot the pointer to the source, though; it might be worth a hyperlink.

fgvanzee commented 5 years ago

@loveshack I've committed 80e6c10, which includes short blurb on reproducing performance (similar to my response above) in both Performance.md and PerformanceSmall.md. The blurb includes the directory (with link). Hopefully you find this satisfactory.

flame / blis

how to reproduce performance graphs? #325