AdaptiveCpp / AdaptiveCpp

Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
https://adaptivecpp.github.io/
BSD 2-Clause "Simplified" License
1.36k stars 164 forks source link

Question about library-only compilation flow. #1053

Open Momellouky opened 1 year ago

Momellouky commented 1 year ago

Hi,

When hipSYCL acts as a library for third party compilers (openmp as an example). What happens exactly? I assume that there is a layer that translates sycl constructs to openmp pragmas so openmp compiler could compile the code. But until now I am unable to check the generated code. So I decided to put my question here.

Thank you in advance

cheers,

fodinabor commented 1 year ago

In library-only flows, there's not really a translation, but instead as the name suggests, OpenSYCL acts as a library that internally uses the constructs by the programming models. For OpenMP e.g. you can see the OpenMP pragmas used for the various kernel modes in: https://github.com/OpenSYCL/OpenSYCL/blob/develop/include/hipSYCL/glue/omp/omp_kernel_launcher.hpp#LL148C70-L148C70 While you gotta jump through a few hoops to find what, what means, you can e.g. see that parallel_for_kernel that implements parallel_for(range<N>) uses the iterate_range_omp_for internally.

Momellouky commented 1 year ago

Hi @fodinabor

Thank you, your answer is very precise and useful. I am wondering just about the _OPENMP. I guess it will be defined if we activate the omp backend at build time throught cmake -DWITH_CPU_BACKEND=ON. So if cmake -DWITH_CPU_BACKEND is set to OFF. The parallel_for will be "translated" to a simple for loop that runs in the CPU.

Thank you.

fodinabor commented 1 year ago

Yes. The _OPENMP is usually set by the compilers whenever the corresponding -fopenmp (or similar flag to activate compilation with OpenMP) is provided. When using the OpenMP backend as you say, this flag is automatically provided by Open SYCL / hipSYCL. If the CPU/OpenMP backend is deactivated, the flags will not be active. I am unsure, though, as to what extent the sequential flow is actually able to correctly execute all kinds of kernels. @illuhad should know more :)

illuhad commented 1 year ago

I am unsure, though, as to what extent the sequential flow is actually able to correctly execute all kinds of kernels.

The sequential flow (i.e. if the OpenMP backend is not enabled) should be able to execute all programs in a semantically correct way. This is because OpenMP is only responsible for the parallelization across work groups, while the parallelism within a work group is handled by fibers or compiler transformations. Consequently, the only effect that you get when not enabling OpenMP is that work groups are executed sequentially. So things are slower, but still correct :)