SimFlowCFD / RapidCFD-dev

RapidCFD is an OpenFOAM fork running fully on CUDA platform. Brought to you by
https://sim-flow.com
Other
325 stars 94 forks source link

how large a test case is required to see speedup #58

Open manasi-t24 opened 5 years ago

manasi-t24 commented 5 years ago

Hi,

I am running various cases on RapidCFD and comparing it to executions by OpenFOAM on CPU's. I have two Tesla K20's, Centos 6.10 and Cuda 9.2. On the CPU side, I have a processor with 16 cores.

I have run the following cases (both with single GPU)- 1) Cavity case (400 cells), icoFoam solver -

Moreover, am I doing something wrong that I am not able to see the speedup with the 2k cells tets case also?

Thanks and regards, Manasi

TonkomoLLC commented 5 years ago

wyldckat discussed this topic on CFD-Online here.

My personal experience is that any of the normal OpenFOAM tutorials with a few thousand cells will be far slower on RapidCFD than on CPU-based OpenFOAM. I typically need to run with at least a few million cells to begin to see a speedup. My general experience is that 1 GPU ~= 16 cores, but this can fluctuate depending on the case. However, it seems to match the at the RapidCFD website. Specifically, the way I look at the bar graph at this referenced web site is that 1x K20 is about twice as fast as an 8-core CPU. Note that in this case, the test is with 4 million cells.

As the cases get larger and more attractive for GPU computing with RapidCFD, more GPU memory is required. If using an older GPU card (like the K20) with ~4 GB of RAM, then you need to start working with multiple GPU's to have enough RAM to even run the case.

The RapidCFD website shows some leveling off (for at least their test case) in speedup after ~4 GPU's. I do not have extensive experience scaling up with many GPU's, so I am not sure if this is a communication/bandwidth issue, or if the cases have to become even larger to be advantageous as more GPU's are added.

Unfortunately you're going to have to struggle with large test cases. I posted the damBreak case that you're looking at because I didn't see any other test case posted where someone gave all the details needed to reproduce a speed test.

Good luck with your calculations.

manasi-t24 commented 5 years ago

Hi, Thanks for your detailed reply. Can you please provide me with some large test cases apart from the damBreak case for RapidCFD execution?

Thanks, Manasi

TonkomoLLC commented 5 years ago

Hello, Unfortunately I am not able to provide additional test cases of my own since they are not published. However, you should be able to take many OpenFOAM tutorials, like the damBreak example, and increase the number of cells in blockMesh until such point that the case is significantly larger in number of cells. Sorry I do not have more to offer, but this strategy should help you get more test cases. Best, Eric

manasi-t24 commented 5 years ago

Hi Eric, thanks for being so helpful. I extended the cavity case to 3D and 5M cells using the following entry in blockMeshDict file-

FoamFile { version 2.0; format ascii; class dictionary; object blockMeshDict; } convertToMeters 0.1;

vertices ( (0 0 0) (12500 0 0) (12500 12500 0) (0 12500 0) (0 0 1250) (12500 0 1250) (12500 12500 1250) (0 12500 1250) );

blocks ( hex (0 1 2 3 4 5 6 7) (500 100 100) simpleGrading (1 1 1) );

edges ( );

boundary ( movingWall { type wall; faces ( (3 7 6 2) ); } fixedWalls { type wall; faces ( (0 4 7 3) (2 6 5 1) (1 5 4 0) ); } frontAndBack { type wall; faces ( (0 3 2 1) (4 5 6 7) ); } );

mergePatchPairs ( );

I decomposed it using the following decomposeParDict- FoamFile { version 2.0; format ascii; class dictionary; location "system"; object decomposeParDict; } // * // numberOfSubdomains 2; method simple; simpleCoeffs { n ( 2 1 1 ); delta 0.001; } hierarchicalCoeffs { n ( 1 1 1 ); delta 0.001; order xyz; } manualCoeffs { dataFile ""; } distributed no; roots ( ); After running myicofoam (a variation of icoFoam with more time stamps inserted into it), I get the following output -

/---------------------------------------------------------------------------\ | RapidCFD by simFlow (sim-flow.com) | *---------------------------------------------------------------------------*/ Build : dev-f3775ac96129 Exec : myicofoam -parallel -devices (0 1) Date : Apr 15 2019 Time : 06:53:36 Host : "marskepler" PID : 20451 Case : /home/manasi/OpenFOAM/OpenFOAM-2.3.0/tutorials/incompressible/icoFoam/cavity-large nProcs : 2 Slaves : 1("marskepler.20452") Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking polling iterations : 0 sigFpe : Floating point exception trapping - not supported on this platform fileModificationChecking : Monitoring run-time modified files using timeStampMaster allowSystemOperations : Allowing user-supplied system call operations

// * // Create time

ExecutionTime = 0 s ClockTime = 0 s

Create mesh for time = 0

ExecutionTime = 12.79 s ClockTime = 13 s

Reading transportProperties

Reading field p

Reading field U

Reading/calculating face flux field phi

ExecutionTime = 13.08 s ClockTime = 13 s

ExecutionTime = 13.08 s ClockTime = 13 s

Starting time loop

Time = 0.005

Courant Number mean: 0 max: 0 smoothSolver: Solving for Ux, Initial residual = 1, Final residual = 1.00431e-06, No Iterations 6 smoothSolver: Solving for Uy, Initial residual = 0, Final residual = 0, No Iterations 0 smoothSolver: Solving for Uz, Initial residual = 0, Final residual = 0, No Iterations 0 AINVPCG: Solving for p, Initial residual = 1, Final residual = 0.0275077, No Iterations 1001 time step continuity errors : sum local = 1.53373e-15, global = 1.4962e-30, cumulative = 1.4962e-30 AINVPCG: Solving for p, Initial residual = 0.0136118, Final residual = 0.0157013, No Iterations 1001 time step continuity errors : sum local = 1.63048e-15, global = 2.10877e-30, cumulative = 3.60497e-30 ExecutionTime = 42.78 s ClockTime = 43 s

Time = 0.01

Courant Number mean: 2.56427e-11 max: 1.25432e-09 smoothSolver: Solving for Ux, Initial residual = 0.333189, Final residual = 3.34383e-06, No Iterations 5 smoothSolver: Solving for Uy, Initial residual = 0.333333, Final residual = 3.34532e-06, No Iterations 5 smoothSolver: Solving for Uz, Initial residual = 0.332773, Final residual = 3.3349e-06, No Iterations 5 AINVPCG: Solving for p, Initial residual = 0.250862, Final residual = 0.000104895, No Iterations 1001 time step continuity errors : sum local = 1.59452e-17, global = -7.83209e-30, cumulative = -4.22712e-30 AINVPCG: Solving for p, Initial residual = 9.17532e-05, Final residual = 5.86066e-05, No Iterations 1001 time step continuity errors : sum local = 9.97479e-18, global = 6.51035e-30, cumulative = 2.28323e-30 ExecutionTime = 72.49 s ClockTime = 74 s

Time = 0.015

Courant Number mean: 5.12454e-11 max: 2.50843e-09 smoothSolver: Solving for Ux, Initial residual = 0.199715, Final residual = 2.00431e-06, No Iterations 5 smoothSolver: Solving for Uy, Initial residual = 0.199987, Final residual = 2.00706e-06, No Iterations 5 smoothSolver: Solving for Uz, Initial residual = 0.939471, Final residual = 9.41684e-06, No Iterations 5 AINVPCG: Solving for p, Initial residual = 0.106592, Final residual = 3.25747e-05, No Iterations 1001 time step continuity errors : sum local = 6.74438e-18, global = 4.83889e-30, cumulative = 7.12212e-30 AINVPCG: Solving for p, Initial residual = 3.26304e-05, Final residual = 2.1293e-05, No Iterations 1001 time step continuity errors : sum local = 4.35182e-18, global = 1.56253e-30, cumulative = 8.68465e-30 ExecutionTime = 102.25 s ClockTime = 105 s

Time = 0.02

Courant Number mean: 7.68496e-11 max: 3.76263e-09 smoothSolver: Solving for Ux, Initial residual = 0.142677, Final residual = 1.43188e-06, No Iterations 5 smoothSolver: Solving for Uy, Initial residual = 0.143058, Final residual = 1.43573e-06, No Iterations 5 smoothSolver: Solving for Uz, Initial residual = 0.621613, Final residual = 6.2198e-06, No Iterations 5 AINVPCG: Solving for p, Initial residual = 0.0523755, Final residual = 1.24012e-06, No Iterations 1001 time step continuity errors : sum local = 2.8177e-19, global = -1.0358e-29, cumulative = -1.67338e-30 AINVPCG: Solving for p, Initial residual = 2.43752e-06, Final residual = 9.93868e-07, No Iterations 21 time step continuity errors : sum local = 2.39838e-19, global = 1.58922e-29, cumulative = 1.42188e-29 ExecutionTime = 122.79 s ClockTime = 126 s

Time = 0.025

Courant Number mean: 1.02446e-10 max: 5.01683e-09 smoothSolver: Solving for Ux, Initial residual = 0.110968, Final residual = 1.11365e-06, No Iterations 5 smoothSolver: Solving for Uy, Initial residual = 0.111305, Final residual = 1.11705e-06, No Iterations 5 smoothSolver: Solving for Uz, Initial residual = 0.439472, Final residual = 4.39523e-06, No Iterations 5 AINVPCG: Solving for p, Initial residual = 0.0268281, Final residual = 1.18238e-06, No Iterations 1001 time step continuity errors : sum local = 3.00382e-19, global = -3.72512e-30, cumulative = 1.04937e-29 AINVPCG: Solving for p, Initial residual = 2.3321e-06, Final residual = 9.93058e-07, No Iterations 20 time step continuity errors : sum local = 2.60171e-19, global = 1.53723e-29, cumulative = 2.5866e-29 ExecutionTime = 143.34 s ClockTime = 147 s

Time = 0.03

Courant Number mean: 1.28035e-10 max: 6.27103e-09 smoothSolver: Solving for Ux, Initial residual = 0.0907959, Final residual = 9.1056e-06, No Iterations 4 smoothSolver: Solving for Uy, Initial residual = 0.0910849, Final residual = 9.13469e-06, No Iterations 4 smoothSolver: Solving for Uz, Initial residual = 0.283356, Final residual = 2.83376e-06, No Iterations 5 AINVPCG: Solving for p, Initial residual = 0.0298858, Final residual = 4.69836e-06, No Iterations 1001 time step continuity errors : sum local = 1.17045e-18, global = 9.14367e-31, cumulative = 2.67803e-29 AINVPCG: Solving for p, Initial residual = 5.21609e-06, Final residual = 8.87629e-06, No Iterations 1001 time step continuity errors : sum local = 2.20171e-18, global = -1.04351e-29, cumulative = 1.63452e-29 ExecutionTime = 172.85 s ClockTime = 177 s

and so on.....

But as you can see, the solver hardly spends any time in solving Ux, Uy and Uz and for p, it always reaches the maximum number of iterations 1001 i.e. it is not converging on its own. I believe it is due to the fact that the mesh is not proper but I couldn't find any tutorials to extend the case and I am new to OpenFOAM as well as RapidCFD. If you could help me in constructing a proper large mesh for the simulation, I would be very grateful.

Regards, Manasi

TonkomoLLC commented 5 years ago

Hello,

I apologize but my schedule today does not lend me to provide a new blockMeshDict and a tested case. However, I do have a few pointers for you that hopefully will help.

Firstly, in blockMeshDict

  1. convertToMeters 0.1; and vertices like (12500 0 0) suggest that the domains is 1250 m x 1250 m x 1250m. I do not think this was your intention to have a km-sized grid in the X-Y direction (the grid is 1250 * 0.1 = 125m in the Z direction). Maybe you meant convertToMeters 1.0; and dimensions like (1.250 0 0), etc. for your vertices. Please double check.

  2. Your cells in the X,Y,Z direction are 500, 100 and 100, referring to the line in blockMeshDict that reads hex (0 1 2 3 4 5 6 7) (500 100 100) simpleGrading (1 1 1). I suggest using square (equal sized) grids, when debugging your case. So for example, if your grid is 1.250m x 1.250m x 0.125m, maybe 400 x 400 x 40, which yields a 6.4 million cell grid. I know this is more than your target 5 million cell grid.

Ideally, after points 1 and 2, you can perform checkMesh and the dimensions, cell sizes, etc. are reasonable for your case. If the mesh looks fine and is correctly dimensioned, then hopefully you can rule out the grid when troubleshooting further.

I sincerely hope that getting your grid dimensions and cell spacing correct will solve your runtime problems, but if not, please remember that the iterations continue until the residual tolerance is achieved. If your residual tolerance is too tight, then you will hit the max number of iteration constraint. A question to ask yourself is whether the residual tolerance is too tight, or not. If you set your solver tolerance to a higher value (so in system/fvSolution under p, change tolerance 1e-06; to something larger) the p equation may converge. Similarly, if you add a maxIter line in the p section of the fvSolution file, you can increase (or decrease) the max number of iterations from 1,000 to whatever number you like.

As an aside, since you mentioned that you are new to OpenFOAM, I highly recommend this presentation on OpenFOAM tips and tricks. As Dr. Guerrero rightly notes in this referenced presentation:

Residuals are not your solution. Low residuals do not automatically mean a correct solution, and high residuals do not automatically mean a wrong solution.

In other words, if the solution is physically correct, "high residuals" and/or reaching the max iterations may not be a problem.

All this said... I think there are some issues in the blockMeshDict for you to consider first.

I hope this advice is helpful for your endeavors.

Best regards,

Eric

manasi-t24 commented 5 years ago

Thank You Eric. Will try this!

manasi-t24 commented 5 years ago

Hi Eric, I changed the blockMeshDict to 400 x 400 x 40 cells so that the mesh is of 6.4M cells. After doing this and decomposing the mesh using decomposePar between two processors, I ran icoFoam with both of my GPU devices and got the following error-

Create time

Create mesh for time = 0

terminate called after throwing an instance of 'thrust::system::system_error' what(): copy:: H->D: failed: invalid argument [marskepler:09493] Process received signal [marskepler:09493] Signal: Aborted (6) [marskepler:09493] Signal code: (-6) [marskepler:09493] [ 0] /lib64/libc.so.6[0x323fc32570] [marskepler:09493] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x323fc324f5] [marskepler:09493] [ 2] /lib64/libc.so.6(abort+0x175)[0x323fc33cd5] [marskepler:09493] [ 3] /usr/local/lib64/libstdc++.so.6(_ZN9__gnu_cxx27verbose_terminate_handlerEv+0x125)[0x7f50484c1425] [marskepler:09493] [ 4] /usr/local/lib64/libstdc++.so.6(+0x8f1f6)[0x7f50484bf1f6] [marskepler:09493] [ 5] /usr/local/lib64/libstdc++.so.6(+0x8f241)[0x7f50484bf241] [marskepler:09493] [ 6] /usr/local/lib64/libstdc++.so.6(+0x8f483)[0x7f50484bf483] [marskepler:09493] [ 7] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libsampling.so(+0x182609)[0x7f504ab9f609] [marskepler:09493] [ 8] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libsampling.so(_ZN6thrust8cuda_cub6copy19cross_system_copy_nINS_6system3cpp6detail3tagENS0_3tagEPKN4Foam6VectorIdEElNS_6detail15normal_iteratorINS_10device_ptrISA_EEEEEET3_RNS5_16execution_policyIT_EERNS0_16execution_policyIT0_EET1_T2_SI_NSD_17integral_constantIbLb0EEE+0xc7)[0x7f504abad537] [marskepler:09493] [ 9] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libsampling.so(_ZN4Foam8gpuFieldINS_6VectorIdEEEC2ERKNS_5FieldIS2_EE+0xbe)[0x7f504abad68e] [marskepler:09493] [10] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZNK4Foam13primitiveMesh23calcFaceCentresAndAreasEv+0x150)[0x7f504906f2c0] [marskepler:09493] [11] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZNK4Foam13primitiveMesh11faceCentresEv+0x19)[0x7f504906f4d9] [marskepler:09493] [12] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZNK4Foam9polyPatch11faceCentresEv+0x27)[0x7f5048fa8d77] [marskepler:09493] [13] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZN4Foam18processorPolyPatch12initGeometryERNS_14PstreamBuffersE+0x44)[0x7f5048fd1884] [marskepler:09493] [14] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZN4Foam16polyBoundaryMesh12calcGeometryEv+0x86)[0x7f5048fddd86] [marskepler:09493] [15] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZN4Foam8polyMeshC1ERKNS_8IOobjectE+0x10e0)[0x7f5049036900] [marskepler:09493] [16] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libfiniteVolume.so(_ZN4Foam6fvMeshC2ERKNS_8IOobjectE+0x1c)[0x7f504b85d66c] [marskepler:09493] [17] myicofoam[0x433078] [marskepler:09493] [18] /lib64/libc.so.6(libc_start_main+0x100)[0x323fc1ed20] [marskepler:09493] [19] myicofoam[0x429d69] [marskepler:09493] End of error message terminate called after throwing an instance of 'thrust::system::system_error' what(): copy:: H->D: failed: invalid argument [marskepler:09492] Process received signal [marskepler:09492] Signal: Aborted (6) [marskepler:09492] Signal code: (-6) [marskepler:09492] [ 0] /lib64/libc.so.6[0x323fc32570] [marskepler:09492] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x323fc324f5] [marskepler:09492] [ 2] /lib64/libc.so.6(abort+0x175)[0x323fc33cd5] [marskepler:09492] [ 3] /usr/local/lib64/libstdc++.so.6(_ZN9__gnu_cxx27verbose_terminate_handlerEv+0x125)[0x7f19c5c58425] [marskepler:09492] [ 4] /usr/local/lib64/libstdc++.so.6(+0x8f1f6)[0x7f19c5c561f6] [marskepler:09492] [ 5] /usr/local/lib64/libstdc++.so.6(+0x8f241)[0x7f19c5c56241] [marskepler:09492] [ 6] /usr/local/lib64/libstdc++.so.6(+0x8f483)[0x7f19c5c56483] [marskepler:09492] [ 7] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libsampling.so(+0x182609)[0x7f19c8336609] [marskepler:09492] [ 8] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libsampling.so(_ZN6thrust8cuda_cub6copy19cross_system_copy_nINS_6system3cpp6detail3tagENS0_3tagEPKN4Foam6VectorIdEElNS_6detail15normal_iteratorINS_10device_ptrISA_EEEEEET3_RNS5_16execution_policyIT_EERNS0_16execution_policyIT0_EET1_T2_SI_NSD_17integral_constantIbLb0EEE+0xc7)[0x7f19c8344537] [marskepler:09492] [ 9] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libsampling.so(_ZN4Foam8gpuFieldINS_6VectorIdEEEC2ERKNS_5FieldIS2_EE+0xbe)[0x7f19c834468e] [marskepler:09492] [10] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZNK4Foam13primitiveMesh23calcFaceCentresAndAreasEv+0x150)[0x7f19c68062c0] [marskepler:09492] [11] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZNK4Foam13primitiveMesh11faceCentresEv+0x19)[0x7f19c68064d9] [marskepler:09492] [12] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZNK4Foam9polyPatch11faceCentresEv+0x27)[0x7f19c673fd77] [marskepler:09492] [13] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZN4Foam18processorPolyPatch12initGeometryERNS_14PstreamBuffersE+0x44)[0x7f19c6768884] [marskepler:09492] [14] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZN4Foam16polyBoundaryMesh12calcGeometryEv+0x86)[0x7f19c6774d86] [marskepler:09492] [15] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libOpenFOAM.so(_ZN4Foam8polyMeshC1ERKNS_8IOobjectE+0x10e0)[0x7f19c67cd900] [marskepler:09492] [16] /home/manasi/OpenFOAM/RapidCFD-dev/platforms/linux64NvccDPOpt/lib/libfiniteVolume.so(_ZN4Foam6fvMeshC2ERKNS_8IOobjectE+0x1c)[0x7f19c8ff466c] [marskepler:09492] [17] myicofoam[0x433078] [marskepler:09492] [18] /lib64/libc.so.6(libc_start_main+0x100)[0x323fc1ed20] [marskepler:09492] [19] myicofoam[0x429d69] [marskepler:09492] End of error message

mpirun noticed that process rank 0 with PID 9492 on node marskepler exited on signal 6 (Aborted).

The case only works fine for 5M cells(which admittedly has many issues of its own). For 4.8M cells also I get this error. Do you know what might be the issue? I am using RapidCFD with Cuda-9.2.

Regards, Manasi

TonkomoLLC commented 5 years ago

Hi, Manasi,

I am not precisely sure of the reason for this error, especially since at 4.8 million cells the case fails but at 5 million cells the case runs (not an out of memory problem, I guess). Therefore, I can only recommend that you start with a known working case (e.g., the DamBreak tutorial at the Tonkomo LLC repository ), or your present 5 million cell test case (albeit with the aforementioned issues), and change one item at a time until the case breaks.

Sorry, I don't have a more precise answer for you. I hope you can narrow down the issue soon.

Best regards,

Eric

manasi-t24 commented 5 years ago

I installed RapidCFD with cuda-8. Now I am not getting any of the previous errors. I think RCFD is not very compatible with cuda-9.0 and above. I was earlier using Cuda-9.2. Thank You Eric for your help.

Regards, Manasi

TonkomoLLC commented 5 years ago

Hi, Manasi,

That is absolutely great feedback!

Glad you got it working. While i have done some testing with CUDA 9.1 (and really limited testing with CUDA 10), most of my RapidCFD work is with CUDA 8. I would not have guessed that your problem was due to CUDA 9.2. Nice detective work.

Best regards,

Eric