Open jiaqiwang969 opened 2 years ago
That is truly interesting. And this is OpenFOAM 2.3.x?
No, in foam-v6
scalar pl = p_L[iface] ;
scalar pr = p_R[iface] ;
scalar ml = M_L[iface] ;
scalar mr = M_R[iface] ;
scalar ul = U_L[iface] ;
scalar ur = U_R[iface] ;
scalar dl = rho_L[iface] ;
scalar dr = rho_R[iface] ;
each one need a copy_if function:
thrust::copy_if
(
thrust::make_permutation_iterator
(
rho_R.getField().begin(),
owner.begin()
),
thrust::make_permutation_iterator
(
rho_R.getField().begin(),
owner.end()
),
duc,
dr.begin(),
isA(ducLevelPress)
);
fa = M0*(2.-M0);
for this simple equation,we need use “thrust::minus
p5p = p5 (ml , 1 , alpha) ;
where p5 is a function:
float p5 (float rm,float sgn ,float alpha) // Return p_5
{
float r ;
if (abs(rm)<1.)
{
r = m2(rm,sgn)*( (sgn*2-rm) - sgn*16*alpha*rm*m2(rm,-sgn) ) ;
}
else
{
r = 1./rm*m1(rm,sgn) ;
}
return r ;
}
Can be witten in lambda function in thrust type? And how?
I believe Another stategy is better, i.e. Ignore the if-statement first, just calculation in the whole field in gpu, and "copy_if" with the final result.
Hi,TonkomoLLC I need to clear up a few confusions, main is about Storage issues of CPU & GPU . Please clarify me~
“forAll” error:
“forAll” error fixed by "thrust",
labelgpuList vs labelUList what is main difference? Does it mean labelgpuList stored Pointer in GPU, labelUList stored Pointer in CPU?
in createField.H, we new build lots of classes,such as:
volScalarField rho
(
IOobject
(
"rho",
runTime.timeName(),
mesh,
IOobject::NO_READ,
IOobject::AUTO_WRITE
),
thermo.rho(),
rhoBoundaryTypes
);
surfaceScalarField neg ( IOobject ( "neg", runTime.timeName(), mesh ), mesh, dimensionedScalar("neg", dimless, -1.0) );
Where does it strored in? Why it can is inter-operable?Does it mean we create it and store in CPU memory? How it transform in to gpuField
The valuable information recieved from Kurashov, his contribution with paper "Modification of RhoCentralFoam Solvers Based on OpenFOAM and RapidCFD to Calculate High-Speed Flows Speed Comparison of GPU and CPU Solvers". I believe his answer may help more people who is newbie like me.
He wrote:
RapidCFD use thrust library for GPU field operations. There are 2 options for field loop operation.
1) Using high-level abstract class (volScalarField or surfaceScalarField) like rhoCentralFoam (https://github.com/Atizar/RapidCFD-dev/blob/master/applications/solvers/compressible/rhoCentralFoam/rhoCentralFoam.C), for example 139 line surfaceScalarField a_pos("a_pos", ap/(ap - am)); 2) Using thust::transform function with functor like thermo library, for example 98 line https://github.com/Atizar/RapidCFD-dev/blob/master/src/thermophysicalModels/basic/psiThermo/hePsiThermo.C
thrust::transform(hCells.begin(),hCells.end(),
thrust::make_zip_iterator(thrust::make_tuple( pCells.begin(),
TCells.begin())),
thrust::make_zip_iterator(thrust::make_tuple(TCells.begin(),
psiCells.begin(),
muCells.begin(),
alphaCells.begin()
)),
hePsiThermoCalculateFunctor<typename MixtureType::thermoType>(this->cellMixture(0)));
functor hePsiThermoCalculateFunctor have full algorithm with face values (label, scalar or vector) of fields.
Please excuse my delayed reply. There were a lot of topics for me to catch up on this morning. I only found the Russian version of the referenced Kurashov paper and did not translate it yet, but I am happy you found a helpful reference.
I follow your points but have limited experience with CUDA programming, so I do not have specific advice for you. I will try to summarize what I think is happening.
I suspect, but do not know for sure, that operations on gpufield are handled internally by RapidCFD to happen on the GPU. I think this is why for so many solvers, with the exception of magneticFoam, the main changes are from CPU type fields to gpu type fields.
I do not fully know what happens when some fields are not gpu fields, like your example above from rhoCentralFoam, where @ line 139 of rhoCentralFoam you found the following surfaceScalarField a_pos("a_pos", ap/(ap - am));
. That is, I am not sure if this line of code is executed on the CPU or GPU. I am suspicious that it happens on the CPU, though.
If you must cycle through cells or faces and such then to perform the calculation on the GPU then I believe you need to use Thrust transformations. Some basic operations like copying values can be done directly in the Thrust block. More complicated math requires a functor to be defined. Examples of functors in RapidCFD generally have __HOST__DEVICE__
function type qualifiers (I hope that is the correct terminology!). This lines up with your example from hePsiThermo.C
where you found the hePsiThermoCalcuateFunctor
contains the mathematical function that is being evaluated.
To summarize, I think I generally agree with your observations, but I am not in a position to speak from authority. I believe you are on the right path and wish you good luck with further coding and experimentation.
hi, friends. I believe I understand what Kurashov tell me about, and maybe solved the issues in #87
And the key is use method 1, in above https://github.com/Atizar/RapidCFD-dev/issues/97#issuecomment-1196674467
// Loop on all cells
forAll(own,iface)
{
if(duc[iface] > ducLevelPress)
{
//Update p
pave[iface] += duc[iface]*(- 0.5*(dp12) + pu) ;
}
}
In RapidCFD env, it should change to:
pave += Foam::pos(duc-ducLevelPress)*duc*(-0.5*(dp12) + pu);
My confusion is that:Does it manipulate data in GPU?
As to your point of confusion:
Are all the fields some sort of gpu
type field? If so, I think the field manipulation will occur on the GPU, but I am not 100% certain.
I meet the same broken-up problem like #86 .
Time = 2.01734661999e-06
AINVPBiCGStab: Solving for k, Initial residual = -nan, Final residual = -nan, No Iterations 1001
AINVPBiCGStab: Solving for k, Initial residual = -nan, Final residual = -nan, No Iterations 1001
As shown below, the Divergence is caused initially from the shock wave. It is more realted to the scheme, maybe.
As you can see, above picture is the result, the comparison between CPU(L) and GPU(R). Both with the same setting and same solver.
It shows in the first and second loop.
I found the smoothSolver in GPU version, the residual is not reach the same level in CPU. Do you know or compare it before? If not, how to set it in double precision in GPU?
For usual,the price of GPU is much higher than CPU. As user, I found GPU prices is 50 times higer than CPU. Which means that, GPU should perform 50 times faster than CPU, to prove its value.
In RapidCFD-dev/opt/bashrc
you will see two settings, WM_ARCH_OPTION
and WM_PRECISION_OPTION
. By default they are set to 64-bit and DP (double precision), respectively. I do not know what other settings are available to enforce double precision calculations on the GPU.
With respect to the differences in the residuals, I do not know why they are slightly different from the CPU. Your image is included below so it is embedded in the discussion on GitHub.
The fact is that it is very good, and residule is reduce much than smootherSolver(o(e-8)). PBiCGStab(o(e-12)).
However, the problem 07 still exist, which means it not the problem of matrix-solver or GPU precision. But it is good, no doubt.
For usual,the price of GPU is much higher than CPU. As user, I found GPU prices is 50 times higer than CPU. Which means that, GPU should perform 50 times faster than CPU, to prove its value.
Yes, this makes a lot of sense. You are seeing 128x improvement per your note above for a standard RapidCFD solver. If so, then so far the economics are on target.
Also, thanks for the test of RapidCFD-PBiCGStab - that is really good news!
I do not know the reason for or a solution to your observations regarding problem 07.
For usual,the price of GPU is much higher than CPU. As user, I found GPU prices is 50 times higer than CPU. Which means that, GPU should perform 50 times faster than CPU, to prove its value.
Yes, this makes a lot of sense. You are seeing 128x improvement per your note above for a standard RapidCFD solver. If so, then so far the economics are on target.
Also, thanks for the test of RapidCFD-PBiCGStab - that is really good news!
I do not know the reason for or a solution to your observations regarding problem 07.
Very rough testing so far, but just have an idea of what to expect. The bug is hidden deep and I think I have once again met the challenge. Because I have no ideas how to debug. I have check and compare the two-version code at least 10 times.
Hi,
I pulled the image
docker pull ghcr.io/jiaqiwang969/openfoam-2.3.x-cuda-mpich-base-18.04-11.4:sha-6c5b3aa
and after that,
/opt/OpenFOAM/RapidCFD-dev/etc/bashrc
and
/opt/OpenFOAM/RapidCFD-dev/Allwmake
in directory of /opt/OpenFOAM/RapidCFD-dev
but I got some errors on multiSolidBodyMotionFvMesh.C
, that is
solidBodyMotionFvMesh/multiSolidBodyMotionFvMesh.C: In constructor 'Foam::multiSolidBodyMotionFvMesh::multiSolidBodyMotionFvMesh(const Foam::IOobject&)':
solidBodyMotionFvMesh/multiSolidBodyMotionFvMesh.C:104:6: error: reference to 'const_iterator' is ambiguous
forAllConstIter(dictionary, dynamicMeshCoeffs_, iter)
^~~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/DLListBase.H:229:7: note: candidates are: class Foam::DLListBase::const_iterator
class const_iterator
^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/UILList.H:218:7: note: class Foam::UILList<Foam::DLListBase, Foam::entry>::const_iterator
class const_iterator
^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/DLListBase.H:229:7: note: class Foam::DLListBase::const_iterator
class const_iterator
^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/UILList.H:218:7: note: class Foam::UILList<Foam::DLListBase, Foam::entry>::const_iterator
class const_iterator
^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/DLListBase.H:229:7: note: class Foam::DLListBase::const_iterator
class const_iterator
^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/UILList.H:218:7: note: class Foam::UILList<Foam::DLListBase, Foam::entry>::const_iterator
class const_iterator
^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/HashTable.H:471:7: note: class Foam::HashTable<Foam::regIOobject*>::const_iterator
class const_iterator
^ ~~~~~~~~~~~~
solidBodyMotionFvMesh/multiSolidBodyMotionFvMesh.C:104:59: error: 'iter' was not declared in this scope
forAllConstIter(dictionary, dynamicMeshCoeffs_, iter)
^
solidBodyMotionFvMesh/multiSolidBodyMotionFvMesh.C:104:59: note: suggested alternative: 'xfer'
forAllConstIter(dictionary, dynamicMeshCoeffs_, iter)
^
xfer
I guess this is about namespace but I couldn't fix it.
Any idea??
Hi, I saw this error with CUDA 11.5 (note that I have not tested above CUDA 11.5).
See Issue #92 , and the discussion around CUDA 11.5
I do not have a solution that fully fixes the problem, just a workaround in that issue.
Hi, everyone, I try to make dockerfile for RapidCFD-dev. It is on process. The idea is to modify the orginal openfoam-2.3.x dockerfile.
Any suggestiones?
https://github.com/jiaqiwang969/openfoam-docker/tree/main/OpenFOAM/openfoam-org
Advantage:
Install and run
Tips:
New solver implement