OP-DSL / OP2-Common

OP2: open-source framework for the execution of unstructured grid applications on clusters of GPUs or multi-core CPUs
https://op-dsl.github.io
Other
98 stars 46 forks source link

Performance degradation on Aero and Airfoil if I pass -DCMAKE_BUILD_TYPE=Release #152

Closed octurion closed 3 years ago

octurion commented 5 years ago

Hi folks,

If I pass -DCMAKE_BUILD_TYPE=Release to the cmake.local script corresponding to the apps directory in the repo, for some reason I am getting performance degradation (and that shows up on two different machines I've run these benchmarks on).

More specifically, if I run airfoil_dp_seq and aero_dp_seq (with the input files new_grid.dat and FE_grid.dat that are generated during building, respectively), I am getting the following numbers:

Benchmark Machine 1 CPU time (s) Machine 2 CPU time (s)
Aero with no explicit build type 67.979 43.322
Aero with Release build type 108.219 84.534
Airfoil with no explicit build type 865.889 1170.140
Airfoil with Release build type 1007.317 1317.661

Any idea on what could be causing this?

Machine 1 has:

Machine 2 has:

EDIT: I'm using gcc 5.4.0 on Ubuntu 16.04

reguly commented 5 years ago

Hi Alexandros,

Thanks for this - which compilers were you using? I assume gcc, if so, which version?

Thanks, István

On 2018. Dec 17., at 5:01, Alexandros Tasos notifications@github.com wrote:

Hi folks,

If I pass -DCMAKE_BUILD_TYPE=Release to the cmake.local script corresponding to the apps directory in the repo, for some reason I am getting performance degradation (and that shows up on two different machines I've run these benchmarks on).

More specifically, if I run airfoil_dp_seq and aero_dp_seq (with the input files new_grid.dat and FE_grid.dat that are generated during building, respectively), I am getting the following numbers:

Benchmark Machine 1 CPU time (s) Machine 2 CPU time (s) Aero with no explicit build type 67.979 43.322 Aero with Release build type 108.219 84.534 Airfoil with no explicit build type 865.889 1170.140 Airfoil with Release build type 1007.317 1317.661 Any idea on what could be causing this?

Machine 1 has:

An Intel Core i5-3230M CPU (3.20 GHz, L3 size: 3 MB) 8 GB of DDR3 RAM at 1333 MHz Machine 2 has:

An Intel Core i7-6700 CPU (3.40 GHz, L3 size: 8 MB) 16 GB of DDR4 RAM at 2133 MHz — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

octurion commented 5 years ago

gcc 5.4.0 on Ubuntu 16.04.

I'm going to give clang a try and report back.

octurion commented 5 years ago

Alright, here's what I'm getting if I use clang++-3.9 (by passing -DOP2_WITH_CUDA=OFF -DOP2_WITH_OPENMP=OFF -DOP2_WITH_MPI=OFF -DCMAKE_CXX_COMPILER=clang++-3.9 to both OP2 itself and the apps)

Benchmark Machine 1 CPU time (s)
Aero with RelWithDebInfo build type 63.070
Aero with Release build type 63.608
Airfoil with RelWithDebInfo build type 812.365
Airfoil with Release build type 809.982

And here's what I'm getting if I explicitly specify the C++ compiler to be g++-5 (-DOP2_WITH_CUDA=OFF -DOP2_WITH_OPENMP=OFF -DOP2_WITH_MPI=OFF -DCMAKE_CXX_COMPILER=g++-5 to both OP2 itself and the apps):

Benchmark Machine 1 CPU time (s)
Aero with RelWithDebInfo build type 61.652
Aero with Release build type 108.918
Airfoil with RelWithDebInfo build type 813.177
Airfoil with Release build type 1000.759
octurion commented 5 years ago

Just retested with GCC 7 and I'm now getting these numbers:

Benchmark Machine 1 CPU time (s)
Aero with RelWithDebInfo build type 61.409
Aero with Release build type 61.479
Airfoil with RelWithDebInfo build type 813.882
Airfoil with Release build type 807.264

This indicates that there is probably a bug with at least GCC 5.4.0 that causes a pessimisation when -O3 is used

reguly commented 5 years ago

This doesn't happen with the Intel or the PGI compilers either...

gihanmudalige commented 3 years ago

Closing this. As seems issue is fixed.