SimFlowCFD / RapidCFD-dev

RapidCFD is an OpenFOAM fork running fully on CUDA platform. Brought to you by
https://sim-flow.com
Other
328 stars 95 forks source link

RapidCFD-container #97

Open jiaqiwang969 opened 2 years ago

jiaqiwang969 commented 2 years ago

Hi, everyone, I try to make dockerfile for RapidCFD-dev. It is on process. The idea is to modify the orginal openfoam-2.3.x dockerfile.

Any suggestiones?

https://github.com/jiaqiwang969/openfoam-docker/tree/main/OpenFOAM/openfoam-org

Advantage:

Install and run

Tips:

New solver implement

jiaqiwang969 commented 2 years ago

That is truly interesting. And this is OpenFOAM 2.3.x?

No, in foam-v6

jiaqiwang969 commented 2 years ago

Progress update

Ex1:

        scalar pl = p_L[iface] ;
        scalar pr = p_R[iface] ;
        scalar ml = M_L[iface] ;
        scalar mr = M_R[iface] ;
        scalar ul = U_L[iface] ;
        scalar ur = U_R[iface] ;
        scalar dl = rho_L[iface] ;
        scalar dr = rho_R[iface] ;

each one need a copy_if function:

        thrust::copy_if
        (
         thrust::make_permutation_iterator
         (
          rho_R.getField().begin(),
          owner.begin()
         ),
         thrust::make_permutation_iterator
         (
          rho_R.getField().begin(),
          owner.end()
         ),
         duc,
         dr.begin(),
         isA(ducLevelPress)
        );

Ex2:

fa = M0*(2.-M0);

for this simple equation,we need use “thrust::minus” with "2.-M0", then use " thrust::multiplies".

Ex3:

        p5p = p5 (ml , 1 , alpha) ;

where p5 is a function:

  float p5 (float rm,float sgn ,float alpha) // Return p_5
   {
   float r ;
   if (abs(rm)<1.)
   {
    r = m2(rm,sgn)*( (sgn*2-rm) - sgn*16*alpha*rm*m2(rm,-sgn) ) ;
   }
   else
   {
    r = 1./rm*m1(rm,sgn) ;
   }
   return r ;
   }

Can be witten in lambda function in thrust type? And how?

jiaqiwang969 commented 2 years ago

NEW IDEAS

I believe Another stategy is better, i.e. Ignore the if-statement first, just calculation in the whole field in gpu, and "copy_if" with the final result.

Question:

  1. how to ..
jiaqiwang969 commented 2 years ago

Confusions

Hi,TonkomoLLC I need to clear up a few confusions, main is about Storage issues of CPU & GPU . Please clarify me~

  1. “forAll” error:

    • Explanation: manipulate GPU memory outside GPU kernel execution. for example, duc[iface], I understand that “duc” is stored in GPU kenerl,“iface” is Pointer stored also in GPU? But we send commands via the CPU.
  2. “forAll” error fixed by "thrust",

    • Does it means that we send commands via the GPU?
  3. labelgpuList vs labelUList what is main difference? Does it mean labelgpuList stored Pointer in GPU, labelUList stored Pointer in CPU?

  4. in createField.H, we new build lots of classes,such as:

    
    volScalarField rho
    (
    IOobject
    (
        "rho",
        runTime.timeName(),
        mesh,
        IOobject::NO_READ,
        IOobject::AUTO_WRITE
    ),
    thermo.rho(),
    rhoBoundaryTypes
    );

surfaceScalarField neg ( IOobject ( "neg", runTime.timeName(), mesh ), mesh, dimensionedScalar("neg", dimless, -1.0) );


Where does it strored in? Why it can is inter-operable?Does it mean we create it and store in CPU memory? How it transform in to gpuField
jiaqiwang969 commented 2 years ago

How to deal with field loop operation in RapidCFD?

The valuable information recieved from Kurashov, his contribution with paper "Modification of RhoCentralFoam Solvers Based on OpenFOAM and RapidCFD to Calculate High-Speed Flows Speed Comparison of GPU and CPU Solvers". I believe his answer may help more people who is newbie like me.

He wrote:

RapidCFD use thrust library for GPU field operations. There are 2 options for field loop operation.

1) Using high-level abstract class (volScalarField or surfaceScalarField) like rhoCentralFoam (https://github.com/Atizar/RapidCFD-dev/blob/master/applications/solvers/compressible/rhoCentralFoam/rhoCentralFoam.C), for example 139 line surfaceScalarField a_pos("a_pos", ap/(ap - am)); 2) Using thust::transform function with functor like thermo library, for example 98 line https://github.com/Atizar/RapidCFD-dev/blob/master/src/thermophysicalModels/basic/psiThermo/hePsiThermo.C

thrust::transform(hCells.begin(),hCells.end(),
                      thrust::make_zip_iterator(thrust::make_tuple( pCells.begin(),
                                                                    TCells.begin())),
                      thrust::make_zip_iterator(thrust::make_tuple(TCells.begin(),
                                                                   psiCells.begin(),
                                                                   muCells.begin(),
                                                                   alphaCells.begin()
                                                                   )),
                      hePsiThermoCalculateFunctor<typename MixtureType::thermoType>(this->cellMixture(0)));

functor hePsiThermoCalculateFunctor have full algorithm with face values (label, scalar or vector) of fields.

TonkomoLLC commented 2 years ago

Please excuse my delayed reply. There were a lot of topics for me to catch up on this morning. I only found the Russian version of the referenced Kurashov paper and did not translate it yet, but I am happy you found a helpful reference.

I follow your points but have limited experience with CUDA programming, so I do not have specific advice for you. I will try to summarize what I think is happening.

I suspect, but do not know for sure, that operations on gpufield are handled internally by RapidCFD to happen on the GPU. I think this is why for so many solvers, with the exception of magneticFoam, the main changes are from CPU type fields to gpu type fields.

I do not fully know what happens when some fields are not gpu fields, like your example above from rhoCentralFoam, where @ line 139 of rhoCentralFoam you found the following surfaceScalarField a_pos("a_pos", ap/(ap - am));. That is, I am not sure if this line of code is executed on the CPU or GPU. I am suspicious that it happens on the CPU, though.

If you must cycle through cells or faces and such then to perform the calculation on the GPU then I believe you need to use Thrust transformations. Some basic operations like copying values can be done directly in the Thrust block. More complicated math requires a functor to be defined. Examples of functors in RapidCFD generally have __HOST__DEVICE__ function type qualifiers (I hope that is the correct terminology!). This lines up with your example from hePsiThermo.C where you found the hePsiThermoCalcuateFunctor contains the mathematical function that is being evaluated.

To summarize, I think I generally agree with your observations, but I am not in a position to speak from authority. I believe you are on the right path and wish you good luck with further coding and experimentation.

jiaqiwang969 commented 2 years ago

Progress update:

hi, friends. I believe I understand what Kurashov tell me about, and maybe solved the issues in #87

And the key is use method 1, in above https://github.com/Atizar/RapidCFD-dev/issues/97#issuecomment-1196674467

Example:

//     Loop on all cells
       forAll(own,iface)
       {
        if(duc[iface] > ducLevelPress)
        {
         //Update p
         pave[iface] += duc[iface]*(- 0.5*(dp12) + pu)  ; 
        }
       }

In RapidCFD env, it should change to:

pave +=  Foam::pos(duc-ducLevelPress)*duc*(-0.5*(dp12) + pu);

confusion

My confusion is that:Does it manipulate data in GPU?

TonkomoLLC commented 2 years ago

As to your point of confusion: Are all the fields some sort of gpu type field? If so, I think the field manipulation will occur on the GPU, but I am not 100% certain.

jiaqiwang969 commented 2 years ago

Problem 07

I meet the same broken-up problem like #86 .

Time = 2.01734661999e-06

AINVPBiCGStab:  Solving for k, Initial residual = -nan, Final residual = -nan, No Iterations 1001
AINVPBiCGStab:  Solving for k, Initial residual = -nan, Final residual = -nan, No Iterations 1001

As shown below, the Divergence is caused initially from the shock wave. It is more realted to the scheme, maybe.

Testing update

https://cdn.mathpix.com/snip/images/MVOUnPAXKvpqxQKA1HECK04sLeDosdHQLov6kx3Z0Ok.original.fullsize.png

As you can see, above picture is the result, the comparison between CPU(L) and GPU(R). Both with the same setting and same solver.

It shows in the first and second loop.

I found the smoothSolver in GPU version, the residual is not reach the same level in CPU. Do you know or compare it before? If not, how to set it in double precision in GPU?

jiaqiwang969 commented 2 years ago

Price Comparison

For usual,the price of GPU is much higher than CPU. As user, I found GPU prices is 50 times higer than CPU. Which means that, GPU should perform 50 times faster than CPU, to prove its value.

TonkomoLLC commented 2 years ago

In RapidCFD-dev/opt/bashrc you will see two settings, WM_ARCH_OPTION and WM_PRECISION_OPTION. By default they are set to 64-bit and DP (double precision), respectively. I do not know what other settings are available to enforce double precision calculations on the GPU.

With respect to the differences in the residuals, I do not know why they are slightly different from the CPU. Your image is included below so it is embedded in the discussion on GitHub.

MVOUnPAXKvpqxQKA1HECK04sLeDosdHQLov6kx3Z0Ok original fullsize

jiaqiwang969 commented 2 years ago

Update by using new matrix solver: RapidCFD-PBiCGStab

The fact is that it is very good, and residule is reduce much than smootherSolver(o(e-8)). PBiCGStab(o(e-12)).

However, the problem 07 still exist, which means it not the problem of matrix-solver or GPU precision. But it is good, no doubt.

TonkomoLLC commented 2 years ago

For usual,the price of GPU is much higher than CPU. As user, I found GPU prices is 50 times higer than CPU. Which means that, GPU should perform 50 times faster than CPU, to prove its value.

Yes, this makes a lot of sense. You are seeing 128x improvement per your note above for a standard RapidCFD solver. If so, then so far the economics are on target.

Also, thanks for the test of RapidCFD-PBiCGStab - that is really good news!

I do not know the reason for or a solution to your observations regarding problem 07.

jiaqiwang969 commented 2 years ago

For usual,the price of GPU is much higher than CPU. As user, I found GPU prices is 50 times higer than CPU. Which means that, GPU should perform 50 times faster than CPU, to prove its value.

Yes, this makes a lot of sense. You are seeing 128x improvement per your note above for a standard RapidCFD solver. If so, then so far the economics are on target.

Also, thanks for the test of RapidCFD-PBiCGStab - that is really good news!

I do not know the reason for or a solution to your observations regarding problem 07.

Very rough testing so far, but just have an idea of what to expect. The bug is hidden deep and I think I have once again met the challenge. Because I have no ideas how to debug. I have check and compare the two-version code at least 10 times.

nima555 commented 1 year ago

Hi,

I pulled the image docker pull ghcr.io/jiaqiwang969/openfoam-2.3.x-cuda-mpich-base-18.04-11.4:sha-6c5b3aa and after that, /opt/OpenFOAM/RapidCFD-dev/etc/bashrc and /opt/OpenFOAM/RapidCFD-dev/Allwmake in directory of /opt/OpenFOAM/RapidCFD-dev

but I got some errors on multiSolidBodyMotionFvMesh.C, that is

solidBodyMotionFvMesh/multiSolidBodyMotionFvMesh.C: In constructor 'Foam::multiSolidBodyMotionFvMesh::multiSolidBodyMotionFvMesh(const Foam::IOobject&)':
solidBodyMotionFvMesh/multiSolidBodyMotionFvMesh.C:104:6: error: reference to 'const_iterator' is ambiguous
     forAllConstIter(dictionary, dynamicMeshCoeffs_, iter)
      ^~~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/DLListBase.H:229:7: note: candidates are: class Foam::DLListBase::const_iterator
         class const_iterator
       ^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/UILList.H:218:7: note:                 class Foam::UILList<Foam::DLListBase, Foam::entry>::const_iterator
         class const_iterator
       ^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/DLListBase.H:229:7: note:                 class Foam::DLListBase::const_iterator
         class const_iterator
       ^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/UILList.H:218:7: note:                 class Foam::UILList<Foam::DLListBase, Foam::entry>::const_iterator
         class const_iterator
       ^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/DLListBase.H:229:7: note:                 class Foam::DLListBase::const_iterator
         class const_iterator
       ^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/UILList.H:218:7: note:                 class Foam::UILList<Foam::DLListBase, Foam::entry>::const_iterator
         class const_iterator
       ^ ~~~~~~~~~~~~
/opt/OpenFOAM/RapidCFD-dev/src/OpenFOAM/lnInclude/HashTable.H:471:7: note:                 class Foam::HashTable<Foam::regIOobject*>::const_iterator
         class const_iterator
       ^ ~~~~~~~~~~~~
solidBodyMotionFvMesh/multiSolidBodyMotionFvMesh.C:104:59: error: 'iter' was not declared in this scope
     forAllConstIter(dictionary, dynamicMeshCoeffs_, iter)
                                                           ^   
solidBodyMotionFvMesh/multiSolidBodyMotionFvMesh.C:104:59: note: suggested alternative: 'xfer'
     forAllConstIter(dictionary, dynamicMeshCoeffs_, iter)
                                                           ^   
                                                           xfer

I guess this is about namespace but I couldn't fix it.

Any idea??

TonkomoLLC commented 1 year ago

Hi, I saw this error with CUDA 11.5 (note that I have not tested above CUDA 11.5).

See Issue #92 , and the discussion around CUDA 11.5

I do not have a solution that fully fixes the problem, just a workaround in that issue.