SimFlowCFD / RapidCFD-dev

RapidCFD is an OpenFOAM fork running fully on CUDA platform. Brought to you by
https://sim-flow.com
Other
328 stars 95 forks source link

RapidCFD-container #97

Open jiaqiwang969 opened 2 years ago

jiaqiwang969 commented 2 years ago

Hi, everyone, I try to make dockerfile for RapidCFD-dev. It is on process. The idea is to modify the orginal openfoam-2.3.x dockerfile.

Any suggestiones?

https://github.com/jiaqiwang969/openfoam-docker/tree/main/OpenFOAM/openfoam-org

Advantage:

Install and run

Tips:

New solver implement

TonkomoLLC commented 2 years ago

Hi, The link above did not work, but I can see from your GitHub site that you already have a good start. For example, you have a docker file for OF9 + CUDA. But when I look inside this Dockefile, it appears to be setup not for OpenFOAM 9 but rather for OpenFOAM 2.3.x. The CUDA version is 10.2

If I understand correctly, you seem to be on the right path. The next step will be to wget the RapidCFD repo, modify it for the type of GPU and so on per issue #93 . Then you should be able to compile RapidCFD within the Dockerfile.

I have not tried these steps personally, so I do not know what problems await. But to reiterate you seem to be on the right path.

Good luck!

jiaqiwang969 commented 2 years ago

Oh, I am modified, and change the path, https://github.com/jiaqiwang969/openfoam-docker/tree/main/OpenFOAM/openfoam-org

Now, everything is ok, but not with mpi path when I compile it.

I just want to do the same things like openfoam-2.3.x. It use "export WM_MPLIB=SYSTEMMPI". But it seems not working with RapidCFD.

I am trying to solve this problem.

jiaqiwang969 commented 2 years ago

Right now, I make two version of mpi: openmpi & mpich. I am not sure which one is ok? And which version is suitable. Any idea?

TonkomoLLC commented 2 years ago

I have only used openmpi with RapidCFD, so that should work. I do not have any feedback on mpich.

jiaqiwang969 commented 2 years ago

Progress update:

Problem 1 : For cuda-11.4 or 11.5, dynamicFvMesh problem occurs as mensioned in #92 If comment it, it will make some solver who depends on it, failed.

/usr/bin/ld: cannot find -ldynamicFvMesh

collect2: error: ld returned 1 exit status

/opt/OpenFOAM/RapidCFD-dev/wmake/Makefile:149: recipe for target '/opt/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/bin/pimpleDyMFoam' failed

make[1]: *** [/opt/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/bin/pimpleDyMFoam] Error 1

/opt/OpenFOAM/RapidCFD-dev/wmake/MakefileApps:39: recipe for target 'pimpleFoam' failed

Problem 2:

Singularity> icoFoam

Error at lnInclude/DeviceStreamI.H:5

CUDA driver version is insufficient for CUDA runtime version

TonkomoLLC commented 2 years ago

Thanks for the progress update. Glad that you can get RapidCFD to compile.

Problem 1. Yep, understood. Some apps that require libdynamicFVMesh.so will not compile. I do not have a workaround except to use a earlier version of CUDA. I do not know the solution to the error discussed in #92. Even though pimpleDyMFoam failed to build, hopefully pimpleFoam still built OK.

Problem 2. I think this error is caused by the CUDA driver on your local machine being too old for the CUDA software version. So you are using CUDA 11.4 according to the docker build log. According to this NVIDIA page, you need to be using an NVIDIA GPU hardware driver version >= 450.80.02. You can confirm the driver version on your Linux machine using some ideas here.

I hope these ideas keep you moving forward.

jiaqiwang969 commented 2 years ago

Thanks for the progress update. Glad that you can get RapidCFD to compile.

Problem 1. Yep, understood. Some apps that require libdynamicFVMesh.so will not compile. I do not have a workaround except to use a earlier version of CUDA. I do not know the solution to the error discussed in #92. Even though pimpleDyMFoam failed to build, hopefully pimpleFoam still built OK.

Problem 2. I think this error is caused by the CUDA driver on your local machine being too old for the CUDA software version. So you are using CUDA 11.4 according to the docker build log. According to this NVIDIA page, you need to be using an NVIDIA GPU hardware driver version >= 450.80.02. You can confirm the driver version on your Linux machine using some ideas here.

I hope these ideas keep you moving forward.

pimpleFoam also links to it, maybe should also modified.

jiaqiwang969 commented 2 years ago

Acutally, my driver version is updated, and cuda version is same with dockerfile.

nvida-smi:

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 470.129.06   Driver Version: 470.129.06   CUDA Version: 11.4     |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|                               |                      |               MIG M. |
TonkomoLLC commented 2 years ago

Hi, thanks for the continued discussion.

I cannot verify that pimpleFoam links to libdynamicFvMesh based on the options file. Nor do I see links to the .H file for the dynamic mesh in the pimpleFoam code. The error you showed above said that pimpleDyMFoam failed:

/opt/OpenFOAM/RapidCFD-dev/wmake/Makefile:149: recipe for target '/opt/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/bin/pimpleDyMFoam' failed

Maybe you can check if pimpleFoam appears in the bin directory (e.g., /opt/RapidCFD-dev/platforms/linux64NvccDPOpt/bin)? If yes, then maybe everything is OK with pimpleFoam, while the fact that pimpleDyMFoam does not appear in the bin directory is understood per #92.

As to the CUDA error, please see this issue on another GitHub site. There, the person wrote:

I've found the problem, thank you very much!
I didn't run the docker image with '--runtime=nvidia'. So the container couldn't load the nvidia driver. For anyone who faced the similar problem, you can run nvidia-smi in the container to see if the container can access to the driver.

Another person wrote:

You can also set the nvidia runtime as default, by adding "default-runtime": "nvidia", to your /etc/docker/daemon.json as described here: https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime

This may be helpful if you mostly use nvidia docker images.

I have not personally tried CUDA from within a Docker image, so I cannot speak with first hand experience that this is indeed your problem. Therefore, if this is not the solution kindly let me know.

jiaqiwang969 commented 2 years ago

Yeah, I also found this solution and solve it at same time. Please see the link of ref.

I use singularity, sound same like docker, I have added the "--nv" and it works.

"RapidCFD-2.3.x-wjq.sif" is made by "singularity build xxx xxx.def".

What I input:

(base) [medgm@gpu01 ~]$ singularity exec --nv RapidCFD-2.3.x-wjq.sif rhoCentralFoam
/*---------------------------------------------------------------------------*\
| RapidCFD by simFlow (sim-flow.com)                                          |
\*---------------------------------------------------------------------------*/
Build  : dev-964f11d713c6
Exec   : /opt/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/bin/rhoCentralFoam
Date   : Jul 26 2022
Time   : 00:57:20
Host   : "xxx"
PID    : 2498006
Case   : xxx
nProcs : 1
sigFpe : Floating point exception trapping - not supported on this platform
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Allowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

--> FOAM FATAL IO ERROR: 
cannot find file

Problem 3:

"sigFpe : Floating point exception trapping - not supported on this platform
fileModificationChecking "

How and why?

For problem1, I will test it with cases.

TonkomoLLC commented 2 years ago

Great! Glad you figured out the docker + NVIDIA driver issue. As to problem 3, Hmm. I noticed this before and was not sure of the reason. I thought it was due to such FPE traps not being supported, as described here, specifically this part:

Trap handlers for floating point exceptions are not supported. On the GPU there is no status flag to indicate when calculations have overflowed, underflowed, or have involved inexact arithmetic

I have the same sigFPE notice on my machine when I run a RapidCFD solver. In sum, I suspect that this is a normal message from a RapidCFD solver.

jiaqiwang969 commented 2 years ago

Thanks for your help! I will go on testing basic cases, and also muti-GPUs. Baically, I run the RapidCFD on hpc cloud. Can you give me any instructions for more tetsting. My final goal is to implement new solver in it. And for my DLR-buffet cases. For this case, the cell is ~15000000, I think RapidCFD may help a lot.

TonkomoLLC commented 2 years ago

Ah, good luck with the DLR-buffet case with so many cells. Impressive.

To move forward with multi-gpu.

  1. You will need to add a ThirdParty-dev like the one found here. The magic to make this work with RapidCFD/GPU's is in the Allwmake file:
    # Add CUDA
    configOpt="--with-cuda"

I think you will need to recompile all of RapidCFD after installing ThirdParty-dev.

  1. You can read detailed instructions on running in parallel in issue #57.

I hope this advice helps. Good luck with the next steps.

jiaqiwang969 commented 2 years ago

Ah, good luck with the DLR-buffet case with so many cells. Impressive.

To move forward with multi-gpu.

  1. You will need to add a ThirdParty-dev like the one found here. The magic to make this work with RapidCFD/GPU's is in the Allwmake file:
  # Add CUDA
  configOpt="--with-cuda"

I think you will need to recompile all of RapidCFD after installing ThirdParty-dev.

  1. You can read detailed instructions on running in parallel in issue problem with multi GPU execution #57.

I hope this advice helps. Good luck with the next steps.

What is difference with the original thirdParty. I have installed ThirdParty-dev in Dockerfile. Does this version work?

 && git clone https://github.com/OpenFOAM/ThirdParty-2.3.x.git
TonkomoLLC commented 2 years ago

I think there is one key difference with the basic 2.3.x ThirdParty distribution:

The Allwmake in the ThirdParty directory has the two added lines noted above. I again refer to: https://github.com/TonkomoLLC/ThirdParty-dev/blob/master/Allwmake#L87

jiaqiwang969 commented 2 years ago

Progress update:

Already use in my case:

nvidia-smi:

|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:CA:00.0 Off |                    0 |
| N/A   48C    P0   211W / 400W |  25828MiB / 40536MiB |     91%      Default |
|                               |                      |             Disabled |

fvSolution:

solvers
{
    "(rho|rhoU|rhoE)"
    {
        solver          diagonal;
    }
    "(U|e|k|omega|Ret|im|nuTilda)"
    {
        solver          smoothSolver;
        smoother        GaussSeidel;
        nSweeps         2;
        tolerance       1e-6;
        relTol          0.01;
    }

    "(U|h|k|omega|Ret|im|nuTilda)Final"
    {
        $nuTilda;
        reltol          0;
    }

    yPsi
    {
        solver          GAMG;
        smoother        GaussSeidel;
        cacheAgglomeration true;
        nCellsInCoarsestLevel 10;
        agglomerator    faceAreaPair;
        mergeLevels     1;
        tolerance       1e-5;
        relTol          0;
    }
}

relaxationFactors
{
    equations
    {
        nuTilda         0.5;
        k               0.5;
        omega           0.5;
        Ret             0.5;
        im   0.5;
    }
}

Any ideas for speeding up? Right now, one A100, memery is enough, almost 100% usage. For cells:12925824.

Do I need use multi-GPUs? I think I have not tapped the potential of RadpidCFD yet.

TonkomoLLC commented 2 years ago

Offhand, I see nothing alarming about your solver settings.

To get some ideas, you can also look at the solver settings for sample cases found here.

Unfortunately, I don't have specific recommendations.

There are some discussions of speedup on CFD-Online. I also wrote in issue #58

My general experience is that 1 GPU ~= 16 cores, but this can fluctuate depending on the case. However, it seems to match the at the RapidCFD website. Specifically, the way I look at the bar graph at this referenced web site is that 1x K20 is about twice as fast as an 8-core CPU. Note that in this case, the test is with 4 million cells.

To be clear, the above was written based on experience with an older K20 on a 2012 era 8-core CPU. You have much more modern equipment at your disposal.

For sure, I think it would be a good idea to try multiple GPU's and see what happens. As reported at the RapidCFD website I anticipate that eventually the performance will level out as more GPU's are added to solve a problem of a given # of cells. This said, my guess is that you will some some performance improvement, though, if you add more GPU's and run in parallel.

To help frame the issue, what kind of speedup are you seeing right now with one GPU relative to one CPU of some number of cores? For example, in the RapidCFD website referenced above, a 4MM cell case is about 2x as fast with one K20 GPU vs. one Intel 8x core CPU. Then, how many cores do you normally run with when you use CPU-based OpenFOAM? Thanks for this feedback.

jiaqiwang969 commented 2 years ago

one A100 GPU is same as 128 cores of cpu in this case. I use 2 node, each node is 64 cores, in cpu. I will check it more detail and report it to you in later testing.

jiaqiwang969 commented 2 years ago

NEXT STEP

Question: how to compile a new solver in rapidCFD.

I have already prepared the openfoam-2.3.x compiling-passed version of solver.

But when I do the same things in rapidCFD env, i.e., just by "wmake", some bugs occurs:

Singularity> wmake
Making dependency list for source file caaFoam.C
SOURCE=caaFoam.C ;  nvcc -Xptxas -dlcm=cg -std=c++11 -m64 -arch=sm_70 -Dlinux64 -DWM_DP -Xcompiler -Wall -Xcompiler -Wextra -Xcompiler -Wno-unused-parameter -Xcompiler -Wno-vla -Xcudafe "--diag_suppress=null_reference" -Xcudafe "--diag_suppress=subscript_out_of_range" -Xcudafe "--diag_suppress=extra_semicolon" -Xcudafe "--diag_suppress=partial_override" -Xcudafe "--diag_suppress=implicit_return_from_non_void_function" -Xcudafe "--diag_suppress=virtual_function_decl_hidden" -O3  -DNoRepository -IBCs/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/finiteVolume/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/thermophysicalModels/basic/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/thermophysicalModels/specie/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/turbulenceModels/compressible/turbulenceModel -I/opt/OpenFOAM/RapidCFD-dev/src/dynamicMesh/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/meshTools/lnInclude  -IlnInclude -I. -I/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/OSspecific/POSIX/lnInclude   -Xcompiler -fPIC -x cu -D__HOST____DEVICE__='__host__ __device__' -o Make/linux64NvccDPOpt/caaFoam.o -c $SOURCE
In file included from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/detail/config/config.h:27:0,
                 from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/detail/config.h:23,
                 from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/device_ptr.h:24,
                 from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/gpuList.H:6,
                 from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/labelList.H:49,
                 from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/UPstream.H:42,
                 from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/Pstream.H:42,
                 from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/parRun.H:35,
                 from /opt/OpenFOAM/RapidCFD-dev/src/finiteVolume/lnInclude/fvCFD.H:4,
                 from caaFoam.C:43:
/usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/detail/config/cpp_dialect.h:131:13: warning: Thrust requires at least C++14. C++11 is deprecated but still supported. C++11 support will be removed in a future release. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
      THRUST_COMPILER_DEPRECATION_SOFT(C++14, C++11);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/cuda/bin/../targets/x86_64-linux/include/cub/util_arch.cuh:36:0,
                 from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/util.h:32,
                 from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/malloc_and_free.h:29,
                 from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/detail/adl/malloc_and_free.h:42,
                 from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/detail/generic/memory.inl:20,
                 from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/detail/generic/memory.h:69,
                 from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/detail/reference.h:23,
                 from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/memory.h:25,
                 from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/device_ptr.h:25,
                 from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/gpuList.H:6,
                 from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/labelList.H:49,
                 from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/UPstream.H:42,
                 from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/Pstream.H:42,
                 from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/parRun.H:35,
                 from /opt/OpenFOAM/RapidCFD-dev/src/finiteVolume/lnInclude/fvCFD.H:4,
                 from caaFoam.C:43:
/usr/local/cuda/bin/../targets/x86_64-linux/include/cub/util_cpp_dialect.cuh:142:13: warning: CUB requires at least C++14. C++11 is deprecated but still supported. C++11 support will be removed in a future release. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
      CUB_COMPILER_DEPRECATION_SOFT(C++14, C++11);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
caaFoam.C(55): error: identifier "mag" is undefined

rhoBoundaryTypes.H(8): error: name followed by "::" must be a class or namespace name

rhoBoundaryTypes.H(12): error: name followed by "::" must be a class or namespace name

createFields.H(137): error: identifier "scalarList" is undefined

createFields.H(137): error: expected a ";"

createFields.H(139): error: identifier "rkCoeff" is undefined

sensor.H(8): error: no operator "[]" matches these operands
            operand types are: Foam::volScalarField [ Foam::label ]

....

sensor.H(14): error: identifier "polyPatch" is undefined

sensor.H(20): error: no operator "[]" matches these operands
            operand types are: Foam::fvPatchField<Foam::scalar> [ Foam::label ]

...

AUSM.H(99): error: no operator "[]" matches these operands
            operand types are: Foam::fvsPatchField<Foam::scalar> [ Foam::label ]

AUSM_conv.H(5): error: identifier "labelUList" is undefined

....

Error limit reached.
100 errors detected in the compilation of "caaFoam.C".
Compilation terminated.
caaFoam.dep:677: recipe for target 'Make/linux64NvccDPOpt/caaFoam.o' failed
make: *** [Make/linux64NvccDPOpt/caaFoam.o] Error 1
bash: autojump_add_to_database: command not found
TonkomoLLC commented 2 years ago

Thank you for confirming that you are finding that one A100 GPU is roughly equivalent to 128 CPU cores of computational capability. I am not sure what the conversion between GU and CPU should be for modern CPUs and GPU's, but for sure this is not horrible. I was afraid it was a lot worse. Yes, I would try some different solver settings along with multiple GPU's. I'll get to your compilation question in a moment.

TonkomoLLC commented 2 years ago

As to the solver compilation, I am not sure if I have enough information to understand what the reason is for the 100 detected errors.

However, I did notice that the solver is called caaFoam.C. Is that perhaps related to the same solver mentioned in #87 with the CPU starting point of https://github.com/vdalessa/caafoam?

If these connections are correct, do you think vdalessa can help?

Beyond that I am not sure how to assist. Debugging the solver with RapidCFD should be similar to debugging a CPU-based OpenFOAM solver for v2.3.x. One key thing to watch out for is that some of OpenFOAM is not implemented with RapidCFD, so if you are needing a feature that is part of CPU OpenFOAM but left out for various reasons in RapidCFD, then then the compilation will fail. As an example, chemical reactions are left out of RapidCFD I am guessing because chemical reaction source terms don't lend so well to GPU computation, so compiling a solver like reactingFoam will fail.

Sorry I do not have specific advice here but maybe you have some ideas on a way forward based on this note, especially if caaFoam is the same as that discussed by vdalessa.

jiaqiwang969 commented 2 years ago

yes, and basically, is from https://github.com/davidem88/rhoEnergyFoam, I just modified a little things as 'caafoam'.

jiaqiwang969 commented 2 years ago

Thanks for connections,I have emailed to vdalessa,maybe he has solved it. I guess for such error is easy to solve, like: "error: identifier "mag" is undefined", because I have not found "mag" function in RapidCFD? Same as labelUList, scalarList, polyPatch.

I just modified it with "Foam::mag", and it solved.

jiaqiwang969 commented 2 years ago

Progress update: for bug solving

  1. solve the identifier error, just by add "Foam::"

Problem 4:

Not sure how to solve yet. same problem as vdalessa. #87

// Internal field
   forAll(U,icell)
   {
    ducSensor[icell] = max(-divU[icell]/Foam::sqrt(divU2[icell] + rotU2[icell] + epsilon),0.) ;
   }

log:

sensor.H(8): error: no operator "[]" matches these operands
            operand types are: Foam::volScalarField [ Foam::label ]

Solved:

    ducSensor = max(-divU/Foam::sqrt(divU2 + rotU2 + epsilon),0.) ;

For this easy type, I just use matrix ideas, and not use "forAll", and bugs disappear. But why? And for more complex one, such as Problem5, I am still headache.

Problem5:

//     Loop on all cells
       forAll(own,iface)
       {
        if(duc[iface] > ducLevelPress)
        {
         //  Left and Right state
         scalar pl = p_L[iface] ;
         scalar pr = p_R[iface] ;
         scalar ml = M_L[iface] ;
         scalar mr = M_R[iface] ;
         scalar ul = U_L[iface] ;
         scalar ur = U_R[iface] ;
         scalar dl = rho_L[iface] ;
         scalar dr = rho_R[iface] ;
//
         scalar fa = m0[iface]*(2.-m0[iface]);
         scalar alpha = 3./16.*(-4.+5.*fa*fa);
//
         scalar p5p = p5 (ml , 1 , alpha) ;
         scalar p5m = p5 (mr ,-1 , alpha) ;
//
         scalar dpr = p5 (mr , 1 , alpha) - p5m ;
         scalar dpl = p5p  - p5(ml, -1, alpha)  ;
//
         scalar pu = -ku*p5p*p5m*(dl + dr)*c12[iface]*fa*(ur-ul) ;
//
         scalar dp12 = pr*dpr - pl*dpl ;
//
         //Update p
         pave[iface] += duc[iface]*(- 0.5*(dp12) + pu)  ; // Pressure dissipation proportional to Ducros sensor
        }
       }

Still I am not understading why the operands bug clearly? For above lines, it seems hard to modified as you suggested below.

TonkomoLLC commented 2 years ago

Great! You are on your way.

As to the operand errors, please see #38 for hints. Maybe you can also compare the thrust operations in magneticFoam in both RapidCFD and OpenFOAM 2.3.1. I have not personally faced this problem before so I do not have more information.

And of course you can try to contact vdalessa if needed. He reported that he solved the problem of conversion for caafoam from CPU OpenFOAM to RapidCFD.

Good luck with these conversions and troubleshooting!

jiaqiwang969 commented 2 years ago

It's a valuable information: by comparing with magneticFoam in both RapidCFD and OpenFOAM 2.3.1:

jiaqiwang969 commented 2 years ago

I have emailed to vdalessa, acutally he does not solve this problem yet. Just comment it.

TonkomoLLC commented 2 years ago

Thanks, Jiaqi. If you have time to help me and vdalessa out, can you try adding libforces.so to this cavity case? Then use the attached controlDict which adds libForces. Since you are running >= CUDA 11.2 you can check if the parallel_for error happens on your GPU.

controlDict.txt

This was the error that vdalessa was facing, and it appears on my system starting with CUDA 11.2 (but not with CUDA 11.1 and earlier). The resulting error is "cudaErrorInvalidDeviceFunction" which can appear if the requested device function is not compiled for the proper device architecture, which makes it possible that the error is hardware dependent. You have access to a very modern GPU so if the error occurs for you that is helpful information for troubleshooting in the future, I think.

The test should not take long to run because the error appears immediately.

If you have time this is greatly appreciated. Thank you.

jiaqiwang969 commented 2 years ago

parallel_for

Do you mean it run in muti-GPUs version? Right now, I still have not into this step, because of Third-Party. I just can access single GPU(cuda-11.5) as you know. I'm happy to help if I could.

TonkomoLLC commented 2 years ago

Sorry for the confusion, this is an issue that can appear on a single GPU. I believe it has to do with a loop over cells or faces (i.e., a for loop) Thanks so much.

jiaqiwang969 commented 2 years ago

Sorry for the confusion, this is an issue that can appear on a single GPU. I believe it has to do with a loop over cells or faces (i.e., a for loop) Thanks so much.

Log:

Create time

Overriding DebugSwitches according to controlDict
Create mesh for time = 0

Reading transportProperties

Reading field p

Reading field U

Reading/calculating face flux field phi

Starting time loop

forceCoeffs forces:
    Not including porosity effects

Time = 0.005

Courant Number mean: 0 max: 0
smoothSolver:  Solving for Ux, Initial residual = 1, Final residual = 9.2192e-06, No Iterations 79
smoothSolver:  Solving for Uy, Initial residual = 0, Final residual = 0, No Iterations 0
AINVPCG:  Solving for p, Initial residual = 1, Final residual = 9.24999e-07, No Iterations 53
time step continuity errors : sum local = 6.15138e-09, global = -4.61898e-19, cumulative = -4.61898e-19
AINVPCG:  Solving for p, Initial residual = 0.523589, Final residual = 6.26273e-07, No Iterations 51
time step continuity errors : sum local = 6.93534e-09, global = -7.34536e-20, cumulative = -5.35351e-19
ExecutionTime = 14.46 s  ClockTime = 17 s

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: cudaErrorInvalidDeviceFunction: invalid device function
Aborted
TonkomoLLC commented 2 years ago

Thanks for confirming the issue exists for you on the A100! That is very helpful.

jiaqiwang969 commented 2 years ago

Thanks for confirming the issue exists for you on the A100! That is very helpful.

You're welcome!

jiaqiwang969 commented 2 years ago

Clues for Problem 5:

I found similar code in "fvcSimpleReconstruct.C"

Singularity> vim ./finiteVolume/finiteVolume/fvc/fvcSimpleReconstruct.C

    const labelUList& owner = mesh.owner();
    forAll(owner, facei)
    {
        label own = owner[facei];
        label nei = neighbour[facei];

        rf[own] += (Cf[facei] - C[own])*ssf[facei];
        rf[nei] -= (Cf[facei] - C[nei])*ssf[facei];
    }

With "labelUList" type, does it run in CPU or GPU? For my poor knowledge, I know GPU type is "labelgpuList".

With similar setting, in Problem 5:

const Foam::labelUList& own = mesh.owner();

The error:

AUSM.H(16): error: no suitable user-defined conversion from "const Foam::labelgpuList" to "const Foam::labelUList" exists

Idea:

       const Foam::labelgpuList& owner = mesh.owner();

owner type is "Foam::labelgpuList", how can I transform it in to "Foam::labelUList"? I think it may be the most efficient solution?

TonkomoLLC commented 2 years ago

For sure give this a try and see if you can achieve a speedup with this change. I believe the code will compile with this change, but I have no idea if there will be an error upon running.

My knowledge is mainly around getting RapidCFD to compile and run cases. My CUDA skills are weaker. So please weight the following advice accordingly.

I have noticed that there is directionally less speedup with RapidCFD when there is memory transfer between the CPU and GPU, and whenever possible/feasible, it is preferable to calculations on the GPU rather than on the CPU.

This said, I believe the opportunity for a more impressive speed-up is possible if the example of magneticFoam is followed, where the field manipulations are accomplished with thrust, and not in the standard way with the CPU.

I hope this reply makes sense. Those who are better versed at GPU programming are welcomed to comment.

jiaqiwang969 commented 2 years ago

What has me stumped is how to use the GPU to process content with “if” statements.

TonkomoLLC commented 2 years ago

I was just looking over RapidCFD solvers. There are not many uses of thrust (only in magneticFoam), but there are many uses of fields setup as some sort of gpu field. Perhaps setting up fields as gpu types will ensure that field operations are done on the GPU. I think that was your point. I am more optimistic about this idea after looking over some solvers.

jiaqiwang969 commented 2 years ago

I agree GPU is better, I just search the method on how to deal with "if" statement.

//     Loop on all cells
       forAll(owner,iface)
       {
        if(duc[iface] > ducLevelPress)
          { 
              ....
          }
       }

Maybe, sound like what magneticFoam did:

label magnetZonei = mesh.faceZones().findZoneID(magnets[i].name());

I need to "find" the ID which "duc[iface] > ducLevelPress" is true. Things to be hard now.

TonkomoLLC commented 2 years ago

I think the line you listed in magneticFoam is making a list of faces that are located in the magnet face zone. I do not use this solver, so I am not 100% certain. If I am correct in my intrepretation of this line,

label magnetZonei = mesh.faceZones().findZoneID(magnets[i].name());

then I think this is line from magnet foam is different than a loop through faces that you want to accomplish.

I did a quick look through the source code, and I think loops over cells may be like normal. Please look at GAMGAgglomeration.C, as an example, where I have linked to a loop over faces with an if block that looks a lot like CPU code.

I reiterate this is not my specialty so I am learning alongside you, so please treat this comment accordingly.

jiaqiwang969 commented 2 years ago

Thank you for your quick reply. I have tried to solve this challenge with the utmost efficiency and with all my might, so I may have interrupted much.

jiaqiwang969 commented 2 years ago

I think the line you listed in magneticFoam is making a list of faces that are located in the magnet face zone. I do not use this solver, so I am not 100% certain. If I am correct in my intrepretation of this line,

label magnetZonei = mesh.faceZones().findZoneID(magnets[i].name());

then I think this is line from magnet foam is different than a loop through faces that you want to accomplish.

I did a quick look through the source code, and I think loops over cells may be like normal. Please look at GAMGAgglomeration.C, as an example, where I have linked to a loop over faces with an if block that looks a lot like CPU code.

I reiterate this is not my specialty so I am learning alongside you, so please treat this comment accordingly.

Yes, I am confused about it. It likes CPU type. But now, the owner is in GPU type. For CPU, it is easy to deal with "if" statement.

If sacrifice some time by transforming back to CPU type, things will be easier. But I also not sure how to make it.

jiaqiwang969 commented 2 years ago

It sounds like an alternative for "if" statement?

Ref:./TurbulenceModels/turbulenceModels/RAS/derivedFvPatchFields/wallFunctions/epsilonWallFunctions/epsilonWallFunction/epsilonWallFunctionFvPatchScalarField.C

typename gpuList<label>::iterator end =
        thrust::copy_if
        (
            faceCells.begin(),
            faceCells.end(),
            weights.begin(),
            constraintCells.begin(),
            epsilonWallFunctionGraterThanToleranceFunctor(tolerance_)
        );
TonkomoLLC commented 2 years ago

Yeah, I think you are on to an idea that looks helpful.

Out of curiosity, what kind of speed up are you seeing right now over CPU's, before making changes to use gpu fields and functors to conduct mathematical operations on the GPU?

jiaqiwang969 commented 2 years ago

Pity, I have not succeed compiling it yet, with GPU.

Ask another question, in advance.

for original cpu code, it additionally includes the "processor" code (when in muti-cpu, the mesh with decomposar):

[please see link for more detail]

//     Add artificial dissipation on the processor boundaries 
       forAll( mesh.boundaryMesh(), iPatch )
       {
        const Foam::polyPatch& patch = mesh.boundaryMesh()[iPatch] ;
        if ((patch.type()=="processor"))
        {
         forAll( patch , iface )
         {
            ....(this part is same as above)
          }
        }

Do you think it should keep or delete, when implement in GPU? How really multi-GPU works? Is it same as muti-cpu with "processor" boundary condition?

TonkomoLLC commented 2 years ago

I am pretty sure the processor boundary still exists with a multi-GPU case.

That is, you still decompose the case with a CPU version of decomposePar before running with a RapidCFD solver and multiple GPU's. And, decomposePar will create processor boundaries when preparing processor directories.

As a result, I believe a multi-GPU case requires the ability to handle processor boundaries.

With respect to compiling: did vdalessa's original RapidCFD code compile before you started making modifications? Thanks for this clarification.

jiaqiwang969 commented 2 years ago

x

jiaqiwang969 commented 2 years ago

you still decompose the

Yes, it compiled. However, he fixed it by comment the line:

//#include "AUSM.H"
//#include "AUSM_conv.H"

which is the source of bug,i.e. no operator "[]" matches these operands.

It means he have not solved it with GPU version acutally. Because when I uncomment it, same bugs appear.

TonkomoLLC commented 2 years ago

Got it. Then some significant work remains to enable the AUSM scheme. I understand now.

jiaqiwang969 commented 2 years ago

In CPU version, I have set 4 cases:

Case cells number cpu cores. dt. speed
2D corase: 97664 16 cores 3.31766e-09s 7.0542e-07 s/min
2D fine: 315264 16 cores 1.61622e-09s 1.8748e-07 s/min
3D corase: 97664*41 128 cores 6.22965e-09s 2.5542e-07 s/min
3D fine: 315264*41 128 cores 1.59084e-09s 2.3863e-08 s/min
TonkomoLLC commented 2 years ago

That is truly interesting. And this is OpenFOAM 2.3.x?