firemodels / fds

Fire Dynamics Simulator
https://pages.nist.gov/fds-smv/
Other
664 stars 624 forks source link

Add OpenCL support for faster parallel computing #9457

Closed ywgATustcbbs closed 3 years ago

ywgATustcbbs commented 3 years ago

Years ago someone has discussed the possibilities of using OpenCL to accelerate computing. However, it seems the proposal was denied.

Reasons to support OpenCL:

1.Currently, OpenCL is a stable, widely-used API. 2.OpenCL is supported by many platforms / hardwares: x86/x64/Arm/Arm64 CPUs, N/A/I GPUs, FPGAs 3.GPUs/FPGAs with OpenCL API perform better than CPU and CPU clusters with OpenMP in matrix manipulations. 4.Today's GPUs / GPU clusters have sufficient memory to run FDS models. At least, it is possible to use MPI and switch between smaller meshes. Loading data from ram to vram is fast enough with PCIE3.0 and faster with 4.0. The faster matrix manipulation will compensate the loading cost

It is suggested that replace OpenMP with OpenCL , or add OpenCL support.

mcgratta commented 3 years ago

We do not have anyone on the development team that knows OpenCL. Could you clone the fds repository, replace the OpenMP directives in one of the source code files, and then show that OpenCL is significantly faster than OpenMP.

ywgATustcbbs commented 3 years ago

We do not have anyone on the development team that knows OpenCL. Could you clone the fds repository, replace the OpenMP directives in one of the source code files, and then show that OpenCL is significantly faster than OpenMP.

I would like to help, but I only have some experiences in coding OpenCL in C++ (not an expert, not in Fortran). Actually, I have zero knowledge about Fortran language.

mcgratta commented 3 years ago

That's the problem. We would need someone who is very knowledgeable in OpenCL who also has a decent familiarity with Fortran. It is possible that someone could comment out the OpenMP directives and replace with the equivalent OpenCL, but that means one also has to know OpenMP.

ywgATustcbbs commented 3 years ago

A related paper: Comparison of OpenMP & OpenCL Parallel Processing Technologies, Krishnahari Thouti, S. R. Sathe, International Journal of Advanced Computer Science and Applications, Volume 3, issue 4, 2012, 56-61 https://arxiv.org/abs/1211.2038v1

One conclusion was OpenCL was generally better than OpenMP.

ywgATustcbbs commented 3 years ago

That's the problem. We would need someone who is very knowledgeable in OpenCL who also has a decent familiarity with Fortran. It is possible that someone could comment out the OpenMP directives and replace with the equivalent OpenCL, but that means one also has to know OpenMP.

OpenMP is simple. Remove the OMP lines doesn't affect the code. The real difficult thing is how to call OCL api in Fortran. OCL is written in c/c++. Perhaps there is a way to warp OCL functions in c/c++ and call in Fortran? I found an unofficial warpper of OpenCL for Fortran on Github(Focal) , but not sure wether it is useable.

mcgratta commented 3 years ago

I just did a man ifort and then also man ifx on my linux computer with the oneAPI Intel toolkits installed. I searched for OpenCL and found nothing, but OpenMP has many citations. This is not a good sign. If we were to commit to OpenCL, we would first need assurance that we could implement it in the Fortran code with no special compiling gymnastics. We are not experts in OpenMP or MPI, but we implemented these features based on the assumption that they are fully supported by the Intel compiling tools, as well as all the other Fortran compilers (there are only a few left in existence). But support for OpenCL is only the first step in our decision process. Next, we would want to see a demonstration that OpenCL would significantly increase our speed. Then, we would need to commit someone to the task of replacing the current OpenMP commands with OpenCL because I would not want to support both. We don't have such a person here, and it's questionable whether someone else is going to given up three months of their lives to make this happen.

I think you see where I am heading. We've had many people suggest various types of parallel processing techniques besides MPI and OpenMP, but I don't think these people understand what an undertaking it is to take a code that is currently used by thousands of people and add a whole new parallelization framework. It's not a compiler option that we can just switch on. It would be great if you could just demonstrate that you could parallelize one loop in FDS, compile it, and run it. That would at least convince us that this is possible. At the moment, I don't even know if it's possible.

ywgATustcbbs commented 3 years ago

I just did a man ifort and then also man ifx on my linux computer with the oneAPI Intel toolkits installed. I searched for OpenCL and found nothing, but OpenMP has many citations. This is not a good sign. If we were to commit to OpenCL, we would first need assurance that we could implement it in the Fortran code with no special compiling gymnastics. We are not experts in OpenMP or MPI, but we implemented these features based on the assumption that they are fully supported by the Intel compiling tools, as well as all the other Fortran compilers (there are only a few left in existence). But support for OpenCL is only the first step in our decision process. Next, we would want to see a demonstration that OpenCL would significantly increase our speed. Then, we would need to commit someone to the task of replacing the current OpenMP commands with OpenCL because I would not want to support both. We don't have such a person here, and it's questionable whether someone else is going to given up three months of their lives to make this happen.

I think you see where I am heading. We've had many people suggest various types of parallel processing techniques besides MPI and OpenMP, but I don't think these people understand what an undertaking it is to take a code that is currently used by thousands of people and add a whole new parallelization framework. It's not a compiler option that we can just switch on. It would be great if you could just demonstrate that you could parallelize one loop in FDS, compile it, and run it. That would at least convince us that this is possible. At the moment, I don't even know if it's possible.

I understand. I will try to write sample code. But I will wrote in ++ since I am not familiar with Fortran, it is believed that C/C++ has same efficiency as Fortran. It could demonstrate the difference between OMP and OCL.

By the way, I use MSVC and AMD CPU, so the pre-built version of FDS has extremely poor performance with OpenMP on AMD CPUs (worse than MPI), probably due to Intel compiler's dirty optimizations for its own CPUs. Because i don't have an IVF compiler, I have to use MPI instead.

mcgratta commented 3 years ago

MPI will always outperform OpenMP in FDS. With MPI, one literally divides the computational domain into multiple meshes, and each CPU works independently on each mesh. With OpenMP, multiple CPUs work on the same mesh, which is less efficient.

As for OpenCL---I have no doubt that you could demonstrate that it is faster than OpenMP for some simple C++ code of your own. That does not really help us. There are many newer frameworks for parallelization that are faster than MPI and/or OpenMP, but these newer frameworks still require a considerable effort to implement, and even then there is no guarantee that they will be as universally accepted as MPI and OpenMP.