Open jiaqiwang969 opened 2 years ago
Hi, The link above did not work, but I can see from your GitHub site that you already have a good start. For example, you have a docker file for OF9 + CUDA. But when I look inside this Dockefile, it appears to be setup not for OpenFOAM 9 but rather for OpenFOAM 2.3.x. The CUDA version is 10.2
If I understand correctly, you seem to be on the right path. The next step will be to wget the RapidCFD repo, modify it for the type of GPU and so on per issue #93 . Then you should be able to compile RapidCFD within the Dockerfile.
I have not tried these steps personally, so I do not know what problems await. But to reiterate you seem to be on the right path.
Good luck!
Oh, I am modified, and change the path, https://github.com/jiaqiwang969/openfoam-docker/tree/main/OpenFOAM/openfoam-org
Now, everything is ok, but not with mpi path when I compile it.
I just want to do the same things like openfoam-2.3.x. It use "export WM_MPLIB=SYSTEMMPI". But it seems not working with RapidCFD.
I am trying to solve this problem.
Right now, I make two version of mpi: openmpi & mpich. I am not sure which one is ok? And which version is suitable. Any idea?
I have only used openmpi with RapidCFD, so that should work. I do not have any feedback on mpich.
Progress update:
Problem 1 : For cuda-11.4 or 11.5, dynamicFvMesh problem occurs as mensioned in #92 If comment it, it will make some solver who depends on it, failed.
/usr/bin/ld: cannot find -ldynamicFvMesh
collect2: error: ld returned 1 exit status
/opt/OpenFOAM/RapidCFD-dev/wmake/Makefile:149: recipe for target '/opt/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/bin/pimpleDyMFoam' failed
make[1]: *** [/opt/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/bin/pimpleDyMFoam] Error 1
/opt/OpenFOAM/RapidCFD-dev/wmake/MakefileApps:39: recipe for target 'pimpleFoam' failed
Problem 2:
Singularity> icoFoam
Error at lnInclude/DeviceStreamI.H:5
CUDA driver version is insufficient for CUDA runtime version
Thanks for the progress update. Glad that you can get RapidCFD to compile.
Problem 1. Yep, understood. Some apps that require libdynamicFVMesh.so will not compile. I do not have a workaround except to use a earlier version of CUDA. I do not know the solution to the error discussed in #92. Even though pimpleDyMFoam failed to build, hopefully pimpleFoam still built OK.
Problem 2. I think this error is caused by the CUDA driver on your local machine being too old for the CUDA software version. So you are using CUDA 11.4 according to the docker build log. According to this NVIDIA page, you need to be using an NVIDIA GPU hardware driver version >= 450.80.02. You can confirm the driver version on your Linux machine using some ideas here.
I hope these ideas keep you moving forward.
Thanks for the progress update. Glad that you can get RapidCFD to compile.
Problem 1. Yep, understood. Some apps that require libdynamicFVMesh.so will not compile. I do not have a workaround except to use a earlier version of CUDA. I do not know the solution to the error discussed in #92. Even though pimpleDyMFoam failed to build, hopefully pimpleFoam still built OK.
Problem 2. I think this error is caused by the CUDA driver on your local machine being too old for the CUDA software version. So you are using CUDA 11.4 according to the docker build log. According to this NVIDIA page, you need to be using an NVIDIA GPU hardware driver version >= 450.80.02. You can confirm the driver version on your Linux machine using some ideas here.
I hope these ideas keep you moving forward.
pimpleFoam also links to it, maybe should also modified.
Acutally, my driver version is updated, and cuda version is same with dockerfile.
nvida-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
Hi, thanks for the continued discussion.
I cannot verify that pimpleFoam links to libdynamicFvMesh based on the options file. Nor do I see links to the .H
file for the dynamic mesh in the pimpleFoam code. The error you showed above said that pimpleDyMFoam failed:
/opt/OpenFOAM/RapidCFD-dev/wmake/Makefile:149: recipe for target '/opt/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/bin/pimpleDyMFoam' failed
Maybe you can check if pimpleFoam appears in the bin directory (e.g., /opt/RapidCFD-dev/platforms/linux64NvccDPOpt/bin)? If yes, then maybe everything is OK with pimpleFoam, while the fact that pimpleDyMFoam does not appear in the bin directory is understood per #92.
As to the CUDA error, please see this issue on another GitHub site. There, the person wrote:
I've found the problem, thank you very much!
I didn't run the docker image with '--runtime=nvidia'. So the container couldn't load the nvidia driver. For anyone who faced the similar problem, you can run nvidia-smi in the container to see if the container can access to the driver.
Another person wrote:
You can also set the nvidia runtime as default, by adding "default-runtime": "nvidia", to your /etc/docker/daemon.json as described here: https://docs.nvidia.com/dgx/nvidia-container-runtime-upgrade/index.html#using-nv-container-runtime
This may be helpful if you mostly use nvidia docker images.
I have not personally tried CUDA from within a Docker image, so I cannot speak with first hand experience that this is indeed your problem. Therefore, if this is not the solution kindly let me know.
Yeah, I also found this solution and solve it at same time. Please see the link of ref.
I use singularity, sound same like docker, I have added the "--nv" and it works.
"RapidCFD-2.3.x-wjq.sif" is made by "singularity build xxx xxx.def".
What I input:
(base) [medgm@gpu01 ~]$ singularity exec --nv RapidCFD-2.3.x-wjq.sif rhoCentralFoam
/*---------------------------------------------------------------------------*\
| RapidCFD by simFlow (sim-flow.com) |
\*---------------------------------------------------------------------------*/
Build : dev-964f11d713c6
Exec : /opt/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/bin/rhoCentralFoam
Date : Jul 26 2022
Time : 00:57:20
Host : "xxx"
PID : 2498006
Case : xxx
nProcs : 1
sigFpe : Floating point exception trapping - not supported on this platform
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Allowing user-supplied system call operations
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time
--> FOAM FATAL IO ERROR:
cannot find file
Problem 3:
"sigFpe : Floating point exception trapping - not supported on this platform
fileModificationChecking "
How and why?
For problem1, I will test it with cases.
Great! Glad you figured out the docker + NVIDIA driver issue. As to problem 3, Hmm. I noticed this before and was not sure of the reason. I thought it was due to such FPE traps not being supported, as described here, specifically this part:
Trap handlers for floating point exceptions are not supported. On the GPU there is no status flag to indicate when calculations have overflowed, underflowed, or have involved inexact arithmetic
I have the same sigFPE notice on my machine when I run a RapidCFD solver. In sum, I suspect that this is a normal message from a RapidCFD solver.
Thanks for your help! I will go on testing basic cases, and also muti-GPUs. Baically, I run the RapidCFD on hpc cloud. Can you give me any instructions for more tetsting. My final goal is to implement new solver in it. And for my DLR-buffet cases. For this case, the cell is ~15000000, I think RapidCFD may help a lot.
Ah, good luck with the DLR-buffet case with so many cells. Impressive.
To move forward with multi-gpu.
# Add CUDA
configOpt="--with-cuda"
I think you will need to recompile all of RapidCFD after installing ThirdParty-dev.
I hope this advice helps. Good luck with the next steps.
Ah, good luck with the DLR-buffet case with so many cells. Impressive.
To move forward with multi-gpu.
- You will need to add a ThirdParty-dev like the one found here. The magic to make this work with RapidCFD/GPU's is in the Allwmake file:
# Add CUDA configOpt="--with-cuda"
I think you will need to recompile all of RapidCFD after installing ThirdParty-dev.
- You can read detailed instructions on running in parallel in issue problem with multi GPU execution #57.
I hope this advice helps. Good luck with the next steps.
What is difference with the original thirdParty. I have installed ThirdParty-dev in Dockerfile. Does this version work?
&& git clone https://github.com/OpenFOAM/ThirdParty-2.3.x.git
I think there is one key difference with the basic 2.3.x ThirdParty distribution:
The Allwmake in the ThirdParty directory has the two added lines noted above. I again refer to: https://github.com/TonkomoLLC/ThirdParty-dev/blob/master/Allwmake#L87
Already use in my case:
nvidia-smi:
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:CA:00.0 Off | 0 |
| N/A 48C P0 211W / 400W | 25828MiB / 40536MiB | 91% Default |
| | | Disabled |
fvSolution:
solvers
{
"(rho|rhoU|rhoE)"
{
solver diagonal;
}
"(U|e|k|omega|Ret|im|nuTilda)"
{
solver smoothSolver;
smoother GaussSeidel;
nSweeps 2;
tolerance 1e-6;
relTol 0.01;
}
"(U|h|k|omega|Ret|im|nuTilda)Final"
{
$nuTilda;
reltol 0;
}
yPsi
{
solver GAMG;
smoother GaussSeidel;
cacheAgglomeration true;
nCellsInCoarsestLevel 10;
agglomerator faceAreaPair;
mergeLevels 1;
tolerance 1e-5;
relTol 0;
}
}
relaxationFactors
{
equations
{
nuTilda 0.5;
k 0.5;
omega 0.5;
Ret 0.5;
im 0.5;
}
}
Any ideas for speeding up? Right now, one A100, memery is enough, almost 100% usage. For cells:12925824.
Do I need use multi-GPUs? I think I have not tapped the potential of RadpidCFD yet.
Offhand, I see nothing alarming about your solver settings.
To get some ideas, you can also look at the solver settings for sample cases found here.
Unfortunately, I don't have specific recommendations.
There are some discussions of speedup on CFD-Online. I also wrote in issue #58
My general experience is that 1 GPU ~= 16 cores, but this can fluctuate depending on the case. However, it seems to match the at the RapidCFD website. Specifically, the way I look at the bar graph at this referenced web site is that 1x K20 is about twice as fast as an 8-core CPU. Note that in this case, the test is with 4 million cells.
To be clear, the above was written based on experience with an older K20 on a 2012 era 8-core CPU. You have much more modern equipment at your disposal.
For sure, I think it would be a good idea to try multiple GPU's and see what happens. As reported at the RapidCFD website I anticipate that eventually the performance will level out as more GPU's are added to solve a problem of a given # of cells. This said, my guess is that you will some some performance improvement, though, if you add more GPU's and run in parallel.
To help frame the issue, what kind of speedup are you seeing right now with one GPU relative to one CPU of some number of cores? For example, in the RapidCFD website referenced above, a 4MM cell case is about 2x as fast with one K20 GPU vs. one Intel 8x core CPU. Then, how many cores do you normally run with when you use CPU-based OpenFOAM? Thanks for this feedback.
one A100 GPU is same as 128 cores of cpu in this case. I use 2 node, each node is 64 cores, in cpu. I will check it more detail and report it to you in later testing.
I have already prepared the openfoam-2.3.x compiling-passed version of solver.
But when I do the same things in rapidCFD env, i.e., just by "wmake", some bugs occurs:
Singularity> wmake
Making dependency list for source file caaFoam.C
SOURCE=caaFoam.C ; nvcc -Xptxas -dlcm=cg -std=c++11 -m64 -arch=sm_70 -Dlinux64 -DWM_DP -Xcompiler -Wall -Xcompiler -Wextra -Xcompiler -Wno-unused-parameter -Xcompiler -Wno-vla -Xcudafe "--diag_suppress=null_reference" -Xcudafe "--diag_suppress=subscript_out_of_range" -Xcudafe "--diag_suppress=extra_semicolon" -Xcudafe "--diag_suppress=partial_override" -Xcudafe "--diag_suppress=implicit_return_from_non_void_function" -Xcudafe "--diag_suppress=virtual_function_decl_hidden" -O3 -DNoRepository -IBCs/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/finiteVolume/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/thermophysicalModels/basic/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/thermophysicalModels/specie/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/turbulenceModels/compressible/turbulenceModel -I/opt/OpenFOAM/RapidCFD-dev/src/dynamicMesh/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/meshTools/lnInclude -IlnInclude -I. -I/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude -I/opt/OpenFOAM/RapidCFD-dev/src/OSspecific/POSIX/lnInclude -Xcompiler -fPIC -x cu -D__HOST____DEVICE__='__host__ __device__' -o Make/linux64NvccDPOpt/caaFoam.o -c $SOURCE
In file included from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/detail/config/config.h:27:0,
from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/detail/config.h:23,
from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/device_ptr.h:24,
from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/gpuList.H:6,
from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/labelList.H:49,
from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/UPstream.H:42,
from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/Pstream.H:42,
from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/parRun.H:35,
from /opt/OpenFOAM/RapidCFD-dev/src/finiteVolume/lnInclude/fvCFD.H:4,
from caaFoam.C:43:
/usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/detail/config/cpp_dialect.h:131:13: warning: Thrust requires at least C++14. C++11 is deprecated but still supported. C++11 support will be removed in a future release. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
THRUST_COMPILER_DEPRECATION_SOFT(C++14, C++11);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/cuda/bin/../targets/x86_64-linux/include/cub/util_arch.cuh:36:0,
from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/util.h:32,
from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/malloc_and_free.h:29,
from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/detail/adl/malloc_and_free.h:42,
from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/detail/generic/memory.inl:20,
from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/detail/generic/memory.h:69,
from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/detail/reference.h:23,
from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/memory.h:25,
from /usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/device_ptr.h:25,
from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/gpuList.H:6,
from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/labelList.H:49,
from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/UPstream.H:42,
from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/Pstream.H:42,
from /opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/parRun.H:35,
from /opt/OpenFOAM/RapidCFD-dev/src/finiteVolume/lnInclude/fvCFD.H:4,
from caaFoam.C:43:
/usr/local/cuda/bin/../targets/x86_64-linux/include/cub/util_cpp_dialect.cuh:142:13: warning: CUB requires at least C++14. C++11 is deprecated but still supported. C++11 support will be removed in a future release. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
CUB_COMPILER_DEPRECATION_SOFT(C++14, C++11);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
caaFoam.C(55): error: identifier "mag" is undefined
rhoBoundaryTypes.H(8): error: name followed by "::" must be a class or namespace name
rhoBoundaryTypes.H(12): error: name followed by "::" must be a class or namespace name
createFields.H(137): error: identifier "scalarList" is undefined
createFields.H(137): error: expected a ";"
createFields.H(139): error: identifier "rkCoeff" is undefined
sensor.H(8): error: no operator "[]" matches these operands
operand types are: Foam::volScalarField [ Foam::label ]
....
sensor.H(14): error: identifier "polyPatch" is undefined
sensor.H(20): error: no operator "[]" matches these operands
operand types are: Foam::fvPatchField<Foam::scalar> [ Foam::label ]
...
AUSM.H(99): error: no operator "[]" matches these operands
operand types are: Foam::fvsPatchField<Foam::scalar> [ Foam::label ]
AUSM_conv.H(5): error: identifier "labelUList" is undefined
....
Error limit reached.
100 errors detected in the compilation of "caaFoam.C".
Compilation terminated.
caaFoam.dep:677: recipe for target 'Make/linux64NvccDPOpt/caaFoam.o' failed
make: *** [Make/linux64NvccDPOpt/caaFoam.o] Error 1
bash: autojump_add_to_database: command not found
Thank you for confirming that you are finding that one A100 GPU is roughly equivalent to 128 CPU cores of computational capability. I am not sure what the conversion between GU and CPU should be for modern CPUs and GPU's, but for sure this is not horrible. I was afraid it was a lot worse. Yes, I would try some different solver settings along with multiple GPU's. I'll get to your compilation question in a moment.
As to the solver compilation, I am not sure if I have enough information to understand what the reason is for the 100 detected errors.
However, I did notice that the solver is called caaFoam.C
. Is that perhaps related to the same solver mentioned in #87 with the CPU starting point of https://github.com/vdalessa/caafoam?
If these connections are correct, do you think vdalessa
can help?
Beyond that I am not sure how to assist. Debugging the solver with RapidCFD should be similar to debugging a CPU-based OpenFOAM solver for v2.3.x. One key thing to watch out for is that some of OpenFOAM is not implemented with RapidCFD, so if you are needing a feature that is part of CPU OpenFOAM but left out for various reasons in RapidCFD, then then the compilation will fail. As an example, chemical reactions are left out of RapidCFD I am guessing because chemical reaction source terms don't lend so well to GPU computation, so compiling a solver like reactingFoam will fail.
Sorry I do not have specific advice here but maybe you have some ideas on a way forward based on this note, especially if caaFoam is the same as that discussed by vdalessa
.
yes, and basically, is from https://github.com/davidem88/rhoEnergyFoam, I just modified a little things as 'caafoam'.
Thanks for connections,I have emailed to vdalessa,maybe he has solved it. I guess for such error is easy to solve, like: "error: identifier "mag" is undefined", because I have not found "mag" function in RapidCFD? Same as labelUList, scalarList, polyPatch.
I just modified it with "Foam::mag", and it solved.
Not sure how to solve yet. same problem as vdalessa. #87
// Internal field
forAll(U,icell)
{
ducSensor[icell] = max(-divU[icell]/Foam::sqrt(divU2[icell] + rotU2[icell] + epsilon),0.) ;
}
log:
sensor.H(8): error: no operator "[]" matches these operands
operand types are: Foam::volScalarField [ Foam::label ]
ducSensor = max(-divU/Foam::sqrt(divU2 + rotU2 + epsilon),0.) ;
For this easy type, I just use matrix ideas, and not use "forAll", and bugs disappear. But why? And for more complex one, such as Problem5, I am still headache.
// Loop on all cells
forAll(own,iface)
{
if(duc[iface] > ducLevelPress)
{
// Left and Right state
scalar pl = p_L[iface] ;
scalar pr = p_R[iface] ;
scalar ml = M_L[iface] ;
scalar mr = M_R[iface] ;
scalar ul = U_L[iface] ;
scalar ur = U_R[iface] ;
scalar dl = rho_L[iface] ;
scalar dr = rho_R[iface] ;
//
scalar fa = m0[iface]*(2.-m0[iface]);
scalar alpha = 3./16.*(-4.+5.*fa*fa);
//
scalar p5p = p5 (ml , 1 , alpha) ;
scalar p5m = p5 (mr ,-1 , alpha) ;
//
scalar dpr = p5 (mr , 1 , alpha) - p5m ;
scalar dpl = p5p - p5(ml, -1, alpha) ;
//
scalar pu = -ku*p5p*p5m*(dl + dr)*c12[iface]*fa*(ur-ul) ;
//
scalar dp12 = pr*dpr - pl*dpl ;
//
//Update p
pave[iface] += duc[iface]*(- 0.5*(dp12) + pu) ; // Pressure dissipation proportional to Ducros sensor
}
}
Still I am not understading why the operands bug clearly? For above lines, it seems hard to modified as you suggested below.
Great! You are on your way.
As to the operand errors, please see #38 for hints. Maybe you can also compare the thrust operations in magneticFoam in both RapidCFD and OpenFOAM 2.3.1. I have not personally faced this problem before so I do not have more information.
And of course you can try to contact vdalessa
if needed. He reported that he solved the problem of conversion for caafoam from CPU OpenFOAM to RapidCFD.
Good luck with these conversions and troubleshooting!
It's a valuable information: by comparing with magneticFoam in both RapidCFD and OpenFOAM 2.3.1:
CPU version:
forAll(magnets, i)
{
label magnetZonei = mesh.faceZones().findZoneID(magnets[i].name());
const labelList& faces = mesh.faceZones()[magnetZonei];
const scalar muri = magnets[i].mur();
const scalar Mri = magnets[i].Mr().value();
const vector& orientationi = magnets[i].orientation();
const surfaceVectorField& Sf = mesh.Sf();
forAll(faces, i)
{
label facei = faces[i];
murf[facei] = muri;
Mrf[facei] = Mri*(orientationi & Sf[facei]);
}
}
GPU version:
forAll(magnets, i)
{
label magnetZonei = mesh.faceZones().findZoneID(magnets[i].name());
const labelgpuList& faces = mesh.faceZones()[magnetZonei].getList();
const scalar muri = magnets[i].mur();
const scalar Mri = magnets[i].Mr().value();
const vector& orientationi = magnets[i].orientation();
const surfaceVectorField& Sf = mesh.Sf();
thrust::fill
(
thrust::make_permutation_iterator
(
murf.getField().begin(),
faces.begin()
),
thrust::make_permutation_iterator
(
murf.getField().begin(),
faces.end()
),
muri
);
thrust::transform
(
thrust::make_constant_iterator(Mri),
thrust::make_constant_iterator(Mri)+faces.size(),
thrust::make_transform_iterator
(
thrust::make_permutation_iterator
(
Sf.getField().begin(),
faces.begin()
),
Foam::dotOperatorSFFunctor<vector,vector,scalar>(orientationi)
),
thrust::make_permutation_iterator
(
Mrf.getField().begin(),
faces.begin()
),
Foam::multiplyOperatorFunctor<scalar,scalar,scalar>()
);
}
I have emailed to vdalessa, acutally he does not solve this problem yet. Just comment it.
Thanks, Jiaqi.
If you have time to help me and vdalessa out, can you try adding libforces.so to this cavity case? Then use the attached controlDict which adds libForces. Since you are running >= CUDA 11.2 you can check if the parallel_for
error happens on your GPU.
This was the error that vdalessa was facing, and it appears on my system starting with CUDA 11.2 (but not with CUDA 11.1 and earlier). The resulting error is "cudaErrorInvalidDeviceFunction" which can appear if the requested device function is not compiled for the proper device architecture, which makes it possible that the error is hardware dependent. You have access to a very modern GPU so if the error occurs for you that is helpful information for troubleshooting in the future, I think.
The test should not take long to run because the error appears immediately.
If you have time this is greatly appreciated. Thank you.
parallel_for
Do you mean it run in muti-GPUs version? Right now, I still have not into this step, because of Third-Party. I just can access single GPU(cuda-11.5) as you know. I'm happy to help if I could.
Sorry for the confusion, this is an issue that can appear on a single GPU. I believe it has to do with a loop over cells or faces (i.e., a for loop) Thanks so much.
Sorry for the confusion, this is an issue that can appear on a single GPU. I believe it has to do with a loop over cells or faces (i.e., a for loop) Thanks so much.
Create time
Overriding DebugSwitches according to controlDict
Create mesh for time = 0
Reading transportProperties
Reading field p
Reading field U
Reading/calculating face flux field phi
Starting time loop
forceCoeffs forces:
Not including porosity effects
Time = 0.005
Courant Number mean: 0 max: 0
smoothSolver: Solving for Ux, Initial residual = 1, Final residual = 9.2192e-06, No Iterations 79
smoothSolver: Solving for Uy, Initial residual = 0, Final residual = 0, No Iterations 0
AINVPCG: Solving for p, Initial residual = 1, Final residual = 9.24999e-07, No Iterations 53
time step continuity errors : sum local = 6.15138e-09, global = -4.61898e-19, cumulative = -4.61898e-19
AINVPCG: Solving for p, Initial residual = 0.523589, Final residual = 6.26273e-07, No Iterations 51
time step continuity errors : sum local = 6.93534e-09, global = -7.34536e-20, cumulative = -5.35351e-19
ExecutionTime = 14.46 s ClockTime = 17 s
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: cudaErrorInvalidDeviceFunction: invalid device function
Aborted
Thanks for confirming the issue exists for you on the A100! That is very helpful.
Thanks for confirming the issue exists for you on the A100! That is very helpful.
You're welcome!
I found similar code in "fvcSimpleReconstruct.C"
Singularity> vim ./finiteVolume/finiteVolume/fvc/fvcSimpleReconstruct.C
const labelUList& owner = mesh.owner();
forAll(owner, facei)
{
label own = owner[facei];
label nei = neighbour[facei];
rf[own] += (Cf[facei] - C[own])*ssf[facei];
rf[nei] -= (Cf[facei] - C[nei])*ssf[facei];
}
With "labelUList" type, does it run in CPU or GPU? For my poor knowledge, I know GPU type is "labelgpuList".
With similar setting, in Problem 5:
const Foam::labelUList& own = mesh.owner();
The error:
AUSM.H(16): error: no suitable user-defined conversion from "const Foam::labelgpuList" to "const Foam::labelUList" exists
const Foam::labelgpuList& owner = mesh.owner();
owner type is "Foam::labelgpuList", how can I transform it in to "Foam::labelUList"? I think it may be the most efficient solution?
For sure give this a try and see if you can achieve a speedup with this change. I believe the code will compile with this change, but I have no idea if there will be an error upon running.
My knowledge is mainly around getting RapidCFD to compile and run cases. My CUDA skills are weaker. So please weight the following advice accordingly.
I have noticed that there is directionally less speedup with RapidCFD when there is memory transfer between the CPU and GPU, and whenever possible/feasible, it is preferable to calculations on the GPU rather than on the CPU.
This said, I believe the opportunity for a more impressive speed-up is possible if the example of magneticFoam is followed, where the field manipulations are accomplished with thrust
, and not in the standard way with the CPU.
I hope this reply makes sense. Those who are better versed at GPU programming are welcomed to comment.
What has me stumped is how to use the GPU to process content with “if” statements.
I was just looking over RapidCFD solvers. There are not many uses of thrust (only in magneticFoam), but there are many uses of fields setup as some sort of gpu field. Perhaps setting up fields as gpu types will ensure that field operations are done on the GPU. I think that was your point. I am more optimistic about this idea after looking over some solvers.
I agree GPU is better, I just search the method on how to deal with "if" statement.
// Loop on all cells
forAll(owner,iface)
{
if(duc[iface] > ducLevelPress)
{
....
}
}
Maybe, sound like what magneticFoam did:
label magnetZonei = mesh.faceZones().findZoneID(magnets[i].name());
I need to "find" the ID which "duc[iface] > ducLevelPress" is true. Things to be hard now.
I think the line you listed in magneticFoam is making a list of faces that are located in the magnet face zone. I do not use this solver, so I am not 100% certain. If I am correct in my intrepretation of this line,
label magnetZonei = mesh.faceZones().findZoneID(magnets[i].name());
then I think this is line from magnet foam is different than a loop through faces that you want to accomplish.
I did a quick look through the source code, and I think loops over cells may be like normal. Please look at GAMGAgglomeration.C, as an example, where I have linked to a loop over faces with an if
block that looks a lot like CPU code.
I reiterate this is not my specialty so I am learning alongside you, so please treat this comment accordingly.
Thank you for your quick reply. I have tried to solve this challenge with the utmost efficiency and with all my might, so I may have interrupted much.
I think the line you listed in magneticFoam is making a list of faces that are located in the magnet face zone. I do not use this solver, so I am not 100% certain. If I am correct in my intrepretation of this line,
label magnetZonei = mesh.faceZones().findZoneID(magnets[i].name());
then I think this is line from magnet foam is different than a loop through faces that you want to accomplish.
I did a quick look through the source code, and I think loops over cells may be like normal. Please look at GAMGAgglomeration.C, as an example, where I have linked to a loop over faces with an
if
block that looks a lot like CPU code.I reiterate this is not my specialty so I am learning alongside you, so please treat this comment accordingly.
Yes, I am confused about it. It likes CPU type. But now, the owner is in GPU type. For CPU, it is easy to deal with "if" statement.
If sacrifice some time by transforming back to CPU type, things will be easier. But I also not sure how to make it.
It sounds like an alternative for "if" statement?
Ref:./TurbulenceModels/turbulenceModels/RAS/derivedFvPatchFields/wallFunctions/epsilonWallFunctions/epsilonWallFunction/epsilonWallFunctionFvPatchScalarField.C
typename gpuList<label>::iterator end =
thrust::copy_if
(
faceCells.begin(),
faceCells.end(),
weights.begin(),
constraintCells.begin(),
epsilonWallFunctionGraterThanToleranceFunctor(tolerance_)
);
Yeah, I think you are on to an idea that looks helpful.
Out of curiosity, what kind of speed up are you seeing right now over CPU's, before making changes to use gpu fields and functors to conduct mathematical operations on the GPU?
Pity, I have not succeed compiling it yet, with GPU.
Ask another question, in advance.
for original cpu code, it additionally includes the "processor" code (when in muti-cpu, the mesh with decomposar):
[please see link for more detail]
// Add artificial dissipation on the processor boundaries forAll( mesh.boundaryMesh(), iPatch ) { const Foam::polyPatch& patch = mesh.boundaryMesh()[iPatch] ; if ((patch.type()=="processor")) { forAll( patch , iface ) { ....(this part is same as above) } }
Do you think it should keep or delete, when implement in GPU? How really multi-GPU works? Is it same as muti-cpu with "processor" boundary condition?
I am pretty sure the processor boundary still exists with a multi-GPU case.
That is, you still decompose the case with a CPU version of decomposePar before running with a RapidCFD solver and multiple GPU's. And, decomposePar will create processor boundaries when preparing processor directories.
As a result, I believe a multi-GPU case requires the ability to handle processor boundaries.
With respect to compiling: did vdalessa's original RapidCFD code compile before you started making modifications? Thanks for this clarification.
x
you still decompose the
Yes, it compiled. However, he fixed it by comment the line:
//#include "AUSM.H"
//#include "AUSM_conv.H"
which is the source of bug,i.e. no operator "[]" matches these operands.
It means he have not solved it with GPU version acutally. Because when I uncomment it, same bugs appear.
Got it. Then some significant work remains to enable the AUSM scheme. I understand now.
In CPU version, I have set 4 cases:
Case | cells number | cpu cores. | dt. | speed |
---|---|---|---|---|
2D | corase: 97664 | 16 cores | 3.31766e-09s | 7.0542e-07 s/min |
2D | fine: 315264 | 16 cores | 1.61622e-09s | 1.8748e-07 s/min |
3D | corase: 97664*41 | 128 cores | 6.22965e-09s | 2.5542e-07 s/min |
3D | fine: 315264*41 | 128 cores | 1.59084e-09s | 2.3863e-08 s/min |
That is truly interesting. And this is OpenFOAM 2.3.x?
Hi, everyone, I try to make dockerfile for RapidCFD-dev. It is on process. The idea is to modify the orginal openfoam-2.3.x dockerfile.
Any suggestiones?
https://github.com/jiaqiwang969/openfoam-docker/tree/main/OpenFOAM/openfoam-org
Advantage:
Install and run
Tips:
New solver implement