Kernel Function in Fortran or C++?

AMReX-Codes / amrex

AMReX: Software Framework for Block Structured AMR

https://amrex-codes.github.io/amrex

Other

506 stars 337 forks source link

Kernel Function in Fortran or C++? #572

Open GumpXiaoli opened 4 years ago

GumpXiaoli commented 4 years ago

I have used AMReX for some time. Now almost all kernel functions are wirtten in fortran, and called by c++ main program. Also, i have noticed that a large portion of kernel function in AMReX core have been changed to C++, and in the tutorial you wirte

Although Fortran has native multi-dimensional array, we recommend writing kernels in C++ because of performance portability for CPU and GPU.

My confusion is that what is better if just considering performance (I mean running time) instead of portability for CPU and GPU.

drummerdoc commented 4 years ago

Performance and portability for CPU and GPU are closely linked, since on many of the larger machines you can't get much performance without using the GPU effectively, unfortunately. Weiqun will no doubt have a more thorough take on this, but with modern optimizing compilers and code that exploits all the important things, I don't think one can say that either C++ or Fortran generate faster code, practically. One key observation though that is that inlining complex code is extremely important, since it allows the compilers to see more things to exploit. In converting much of the AMReX functions underneath to C++, great care was given to inline as much as possible, and this has led to a style, and supporting classes such as Array4, that supports inlining.

drummerdoc commented 4 years ago

Oh, also, one can certainly do similar things in any language, including Fortran. C++ was chosen because it is more compatible with the future of the software stacks on the target machines.

GumpXiaoli commented 4 years ago

In my understanding, Array4 is used to imitate the Multi-dimensional array in Fortan. However, extra address computation will be needed in Array4. I don't know wheather the complier will optimize this address computation, or this cost is negligible. If this cost is a trade-off for code readability, will it be better to treat this array as one-dimenional array if the data is continuous and the data maniputaion is just local.

drummerdoc commented 4 years ago

You can certainly get the raw pointer and move through the data any way you like, and can make that decision on a case-by-case basis. Yes, it's not always useful to cast to an Array4 - typically that is useful only when stencil-like operations are needed. There are other looping constructs provided in AMReX for pointwise data, etc. But, bottom line is that if you have a specific use case, and it's a key kernel in your code, you should optimize the access/syntax in whatever way works best for you. The Array4, and other classes and functions, are provided as helper classes only. If you need access to the data in a way that is not provided, let us know.

WeiqunZhang commented 4 years ago

Explicit shape array arguments in Fortran routines are just C pointer, otherwise they cannot be called from C/C++. There is no hardware support for Fortran multi-dimensional array. So the compiler has to generate address computation code for Fortran multi-dimensional array similar to what we do.