ProjectPhysX / FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. Free for non-commercial use.
https://youtube.com/@ProjectPhysX
Other
3.84k stars 303 forks source link

(Not an issue) - Porting to Perl and Python; first steps... making code a bit more standalone and general-purpose... #183

Closed gitcnd closed 4 months ago

gitcnd commented 4 months ago

Just a heads-up: I'm working in my fork here: https://github.com/gitcnd/FluidX3D/tree/inline_things - it is still a work-in-progress and very raw.

I've made a bunch of changes in preparation to port this to perl CPAN (which I know well) and python PyPi (which, in terms of compiled binary releases, I'm still a n00b in). The general idea is to allow folks to "pip install" or "cpani install" to then have native access to the features in FluidX3D. You could probably even use it to display the output inside an ipython notebook (if your notebook backend send the frames back as an embedded video)...

Step 1 is simply to turn this into a standalone .exe program where it can all be set up from the command line. I've done most of the "easy stuff" so far... the harder work (fpxx etc) is still to come

I'm just posting this here in case anyone has comments or wants to know...

Lattice Boltzmann CFD software by Dr. Moritz Lehmann
Usage:
  bin\FluidX3D.exe [OPTION...]

  -h, --help            Print help
  -f, --file arg        input .stl mesh Filename (default: input.stl)
      --rotx arg        X deg rotation of input mesh (default: 0.0)
      --roty arg        Y deg rotation of input mesh (default: 0.0)
      --rotz arg        Z deg rotation of input mesh (default: 0.0)
      --trx arg         X translate input mesh (default: 0.0)
      --try arg         Y translate input mesh (default: 0.0)
      --trz arg         Z translate input mesh (default: 0.0)
  -x arg                X width of sim box (default: 1.0)
  -y arg                Y length of sim box (default: 1.0)
  -z arg                Z height of sim box (default: 1.0)
  -r, --resolution arg  Resolution (default: 4096)
      --re arg          Reynolds number (default: 100000.0)
  -u arg                Velocity in m/s (default: 5.0)
  -c, --cord arg        Cord (length of STL) in meters (default: 1.0)
  -t, --time arg        Time (default: 10000)
      --scale arg       Scale (default: 0.9)
  -a, --aoa arg         Angle of attack (default: -5.0)
      --camx arg        Camera X (default: 19.0)
      --camy arg        Camera Y (default: 19.1)
      --camz arg        Camera Z (default: 19.2)
      --camzoom arg     Camera Zoom (default: 1.0)
      --camrx arg       Camera Rotation X (default: 33.0)
      --camry arg       Camera Rotation Y (default: 42.0)
      --camfov arg      Camera Field of View (default: 68.0)
  -s, --secs arg        Seconds (default: 10.0)
  -w, --window          Enable window instead of fullscreen mode
      --wait            Wait for keypress befor ending
      --pause           Do not auto-start the simulation
      --fps arg         Frames per Second for video output (default: 25.0)
      --SUBGRID         Use SUBGRID #define
      --VOLUME_FORCE    Use VOLUME_FORCE #define
      --floor           Insert a solid floor
      --allowsleep      Do not prevent PC from sleeping
  -d, --display arg     Display (default: 0,1)

Press any key to continue . . .

pic_2024-05-23_17 34 08_593

Besides the above, I've also used the last available key binding (key_O) as a toggle to commence recording frames to disk

gitcnd commented 4 months ago

Just sanity-checking here (I just finished porting the FP16* settings from compile-time to runtime): does it make sense that the Original "concorde" demo uses twice as much CPU ram with "#define FP16S" and/or "#define FP16C" but the same GPU RAM as it does with "FP32" ?

[Below output is from original code, cloned and compiled from https://github.com/ProjectPhysX/FluidX3D.git just now]

e.g. This is with "#define FP16C"

| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                CPU 643 MB, GPU 1x 2142 MB |
| Max Alloc Size  |                                                   1438 MB |

e.g. This is with "#define FP16S"

| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                CPU 643 MB, GPU 1x 2142 MB |
| Max Alloc Size  |                                                   1438 MB |

e.g. This is without either of the above:-

| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 380 MB, GPU 1x 2141 MB |
| Max Alloc Size  |                                                   1700 MB |

Confusingly, the benchmark results differs from the Concorde example results:-

e.g. This is with "#define FP16S"

| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |

e.g. This is with "#define FP16C"

| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |

This is with neither of the FP16* defines set

| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |