update roofline for high order

devitocodes / opesci-fd

A framework for automatically generating finite difference models from a high-level description of the model equations.

http://opesci.org

Other

11 stars 7 forks source link

update roofline for high order #48

Open ggorman opened 8 years ago

ggorman commented 8 years ago

Repeat benchmarks on SENAI machine (Xeon and Xeon Phi) for different spatial orders (2,4,6,8,10,12).

Need the OI and peak flops for both so we can update roofline plot.

ggorman commented 8 years ago

@tj-sun - can you help @felippezacarias get your branch running to do the benchmarking?

@felippezacarias - can you run with a domain 512*_3 so that we can be sure most of the problem is not sitting in L3. Also - so ensure you are not messing up the alignment can you carefully set the domain size, n, such that n+boundary_depth_2 == 512.

tj-sun commented 8 years ago

Please use the feature_higher_spatial_order branch. grid.set_accuracy() to set the order in command line, you can run python tests/eigenwave3d.py -so n where n is the spatial order divided by 2. So -so 2 for 4th order. Note that due to different implementation for the boundary conditions in different orders, the errors are not comparable between different orders. But for now I think we just focus on the kernel performance.

tj-sun commented 8 years ago

I also just added output of kernel AI when you run python tests/eigenwave3d.py

tj-sun commented 8 years ago

also note that the number of ghost cells equals the spatial order. So for 4th order, if setting grid size=100, you will have 105 grid points in total (one more because both side end with grid points) i.e. make sure that grid_size + n*2 + 1 = 512 where n is the number you pass in with -so

tj-sun commented 8 years ago

I've just done some amendments to our AI calculation in the new commit. Currently I see 4th order weighted AI=1.46 and 8th order 2.74. Which I think is about right for float. (The article below seems to be using doubles?) I guess we will see when we got some results.

https://redmine.scorec.rpi.edu/attachments/111/roofline_for_FastMath.pdf

felippezacarias commented 8 years ago

@ggorman should I use the --profiling flag and get the Mflops and walltime from papi or instrumentalize the velocity and stress kernels with time measurement like we did before?

@tj-sun I generated the codes to different orders here, but it seems that no matter what grid size or order I use, dim1, dim2 and dim3 always come with grid_size + 5. Is it correct?

tj-sun commented 8 years ago

Hi, The dimension should change to gridsize + 1 + 2*margin, where margin should equal to order. If that's not what you see I will take a look when I'm back home.

-----Original Message----- From: "felippezacarias" notifications@github.com Sent: ‎18/‎09/‎2015 19:26 To: "opesci/opesci-fd" opesci-fd@noreply.github.com Cc: "tj-sun" tianjiao.sun.2010@gmail.com Subject: Re: [opesci-fd] update roofline for high order (#48)

@ggorman should I use the --profiling flag and get the Mflops and walltime from papi or instrumentalize the velocity and stress kernels with time measurement like we did before? @tj-sun I generated the codes to different orders here, but it seems that no matter what grid size or order I use, dim1, dim2 and dim3 always come with grid_size + 5. Is it correct? — Reply to this email directly or view it on GitHub.

ggorman commented 8 years ago

Why don't you do both (papi + hand instrument) and compare? If there is a big difference we will want to know why.

tj-sun commented 8 years ago

@felippezacarias - you are absolutely right on the grid_size. I didn't recalculate the grid_size after setting new order. It's fixed now.

ggorman commented 8 years ago

@tj-sun going back to your comments above "Currently I see 4th order weighted AI=1.46 and 8th order 2.74. Which I think is about right for float. (The article below seems to be using doubles?) "

This is not making sense to me. Previously we estimated that AI for 4th order was ~0.8 --- remember that initially @felippezacarias reported 1.7 and then you pointed out that this has to be divided by two to take into account floats. I could buy that figure because it was consistent with the figure of 0.94 reported in roofline_for_FastMath.pdf (BTW - your suggestion that the article was talking about double would imply that the AI for floats would be twice that again).

Can we focus on getting this right as it is a key metric.

tj-sun commented 8 years ago

I read the article again yesterday but I think the 0.94 in the article is double precision, so I began to think our AI is too low. I checked again and found the overall calculation earlier was done wrongly. I also added boundary conditions and ghost cell adjustments (according to page 31 of the article)

tj-sun commented 8 years ago

@felippezacarias please note that in the new commit f337943a3e9ffd2aeed89ba02a4d86e6c8bdf1e7 the behaviour of setting spatial order has changed. Now -so=4 will set 4th order instead of 8th order. This is to address issue #41