ProjectPhysX / FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. Free for non-commercial use.
https://youtube.com/@ProjectPhysX
Other
3.76k stars 297 forks source link

Report your benchmark results here! #8

Open ProjectPhysX opened 1 year ago

ProjectPhysX commented 1 year ago

You are welcome to report your benchmark results for the FP32/FP16S/FP16C accuracy levels here. Especially numbers for AMD GPUs are desired for GCN/RDNA/RDNA2 architectures. Thank you!

ConfusedWizard commented 1 year ago

@ProjectPhysX My first results were with a +1000Mhz memory overclock.

Here are the stock results: FP32/FP32 FP32_FP32_stock FP32/FP16S FP32_FP16S_stock FP32/FP16C FP32_FP16C_stock

For calculating efficiency would it not be better to also benchmark the true memory bandwidth as well instead of using the official numbers from nvidia?

ProjectPhysX commented 1 year ago

@ConfusedWizard gotcha, thanks for clarifying and providing stock benchmarks! So as expected the 4090 performs basically the same as a 3090 Ti. It's only fair to always use the peak data sheet bandwidth to compute efficiency, as that's really the upper limit. In a memory bandwidth benchmark you get different bandwidth numbers for coalesced/misaligned read/write access, and usually only coalesced read/write gets close to peak, see here figure 22.

mcelwee1 commented 1 year ago

Quadro RTX 6000 results

FP32-FP16S image

FP32-FP16C image

FP32-FP32 image

Note: This data was collected using the released Windows .exe files. When I run the benchmarks compiled on my Windows machine and run the benchmarks the results are ~5% slower.

mcelwee1 commented 1 year ago

Quadro RTX A5000 laptop GPU results

FP32/FP32 image

FP32/FP16S image

FP32/FP16C image

SLGY commented 1 year ago

Hi ibonito1,

OpenCL support on EPYC CPUs is a bit difficult as these are not officially supported by AMD. Being x86-64, they should work with the Intel OpenCL CPU Runtime though, or alternatively with POCL. Fingers crossed! To run on a specific device, in the console run ./FluidX3D.exe 2 (on Linux) or FluidX3D.exe 2 (on Windows), to select device with ID 2 for example. You can just copy the console output here.

Regards, Moritz

Would it be possible to run this on Google Colab? I have a lot of credit on there and it would be good to run this on their A100 GPU's... I've tried, at a basic level, to use the gcc compiler on there after mounting the FluidX3D files in my google drive. I have tried compliling a few of the .cpp files to no avail and get different types of errors. Or is there a way to compile FluidX3d in Visual Studio so they can be run on Colab directly? (and as far as I know I can't run .exe files on Colab)

ProjectPhysX commented 1 year ago

@SirWixy I've run it in Colab already to benchmark the Tesla T4. Make sure to have WINDOWS_GRAPHICS disabled and compile with ./make.sh. You might also have to enable UTILITIES_NO_CPP17 in src/utilities.hpp line 10 in case gcc there does not support C++17; this will disable automatic folder creation for exported files, so make sure to have the bin/export/ folder setup before running the setup, or else it won't write any files.

SLGY commented 1 year ago

@ProjectPhysX Thanks for pointing out the make.sh file to me, I realised it's also mentioned in the readme file too - my apologies for that. I first rad through the readme file a long time ago before I knew what all that meant but I'll sure refer back to it in future first! I can make this into a separate issue too if you'd like, to keep this benchmark issue cleaner.

For anyone else reading this later and using Google Colab, the UTILITIES_NO_CPP17 line is in src/utilites.hpp

oscarbg commented 1 year ago

Hope someone can post a 7900xt or xtx result..

ProjectPhysX commented 1 year ago

@oscarbg someone reported 7900 XTX/XT benchmarks in Ububntu over on openbenchmarking.org! I just added the values to the table in the Readme file.

Edit: Carsten Spille benchmarked the 7900 XTX/XT on Windows, getting slightly better numbers for the XTX which are also more consistent with the XT. There might have been some driver issues on Ubuntu initially. So I replaced the numbers in the Readme.

IvanBGR commented 1 year ago

|----------------.------------------------------------------------------------| | Device Name | NVIDIA GeForce RTX 4080 | | Device Driver | 528.24 | | OpenCL Version | OpenCL C 1.2 | | Compute Units | 76 at 2850 MHz (9728 cores, 55.449 TFLOPs/s) | | Memory, Cache | 16375 MB, 2128 KB global / 48 KB local | | Buffer Limits | 4093 MB global, 64 KB constant | |----------------'------------------------------------------------------------| 1 warning generated. | Info: OpenCL C code successfully compiled. | |-----------------.-----------------------------------------------------------| | Grid Resolution | 256 x 256 x 256 = 16777216 | | Grid Domains | 1 x 1 x 1 = 1 | | LBM Type | D3Q19 SRT (FP32/FP32) | | Memory Usage | CPU 272 MB, GPU 1x 1488 MB | | Max Alloc Size | 1216 MB | | Time Steps | 10 | | Kin. Viscosity | 1.00000000 | | Relaxation Time | 3.50000000 | | Reynolds Number | Re < 148 | |---------.-------'-----.-----------.-------------------.---------------------| | MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining | | 3883 | 594 GB/s | 231 | 9990 0% | 0s | |---------'-------------'-----------'-------------------'---------------------| | Info: Peak MLUPs/s = 3914 |

|----------------.------------------------------------------------------------| | Device Name | NVIDIA GeForce RTX 4080 | | Device Driver | 528.24 | | OpenCL Version | OpenCL C 1.2 | | Compute Units | 76 at 2850 MHz (9728 cores, 55.449 TFLOPs/s) | | Memory, Cache | 16375 MB, 2128 KB global / 48 KB local | | Buffer Limits | 4093 MB global, 64 KB constant | |----------------'------------------------------------------------------------| 1 warning generated. | Info: OpenCL C code successfully compiled. | |-----------------.-----------------------------------------------------------| | Grid Resolution | 256 x 256 x 256 = 16777216 | | Grid Domains | 1 x 1 x 1 = 1 | | LBM Type | D3Q19 SRT (FP32/FP16S) | | Memory Usage | CPU 272 MB, GPU 1x 880 MB | | Max Alloc Size | 608 MB | | Time Steps | 10 | | Kin. Viscosity | 1.00000000 | | Relaxation Time | 3.50000000 | | Reynolds Number | Re < 148 | |---------.-------'-----.-----------.-------------------.---------------------| | MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining | | 7611 | 586 GB/s | 454 | 9991 10% | 0s | |---------'-------------'-----------'-------------------'---------------------| | Info: Peak MLUPs/s = 7626 |

|----------------.------------------------------------------------------------| | Device Name | NVIDIA GeForce RTX 4080 | | Device Driver | 528.24 | | OpenCL Version | OpenCL C 1.2 | | Compute Units | 76 at 2850 MHz (9728 cores, 55.449 TFLOPs/s) | | Memory, Cache | 16375 MB, 2128 KB global / 48 KB local | | Buffer Limits | 4093 MB global, 64 KB constant | |----------------'------------------------------------------------------------| 1 warning generated. | Info: OpenCL C code successfully compiled. | |-----------------.-----------------------------------------------------------| | Grid Resolution | 256 x 256 x 256 = 16777216 | | Grid Domains | 1 x 1 x 1 = 1 | | LBM Type | D3Q19 SRT (FP32/FP16C) | | Memory Usage | CPU 272 MB, GPU 1x 880 MB | | Max Alloc Size | 608 MB | | Time Steps | 10 | | Kin. Viscosity | 1.00000000 | | Relaxation Time | 3.50000000 | | Reynolds Number | Re < 148 | |---------.-------'-----.-----------.-------------------.---------------------| | MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining | | 7914 | 609 GB/s | 472 | 9977 70% | 0s | |---------'-------------'-----------'-------------------'---------------------| | Info: Peak MLUPs/s = 7933 |

nulaft commented 1 year ago
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.3 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce GTX 970                                     |
| Device ID    1 | Intel(R) HD Graphics 4600                                  |
| Device ID    2 | Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz                   |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce GTX 970                                     |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 528.02                                                     |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 13 at 1253 MHz (1664 cores, 4.170 TFLOPs/s)                |
| Memory, Cache  | 4095 MB, 624 KB global / 48 KB local                       |
| Buffer Limits  | 1023 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
1 warning generated.
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|     979 |    150 GB/s |        58 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 980                                                    |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1618 |    125 GB/s |        96 |         9995  50% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1623                                                   |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1717 |    132 GB/s |       102 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1721                                                   |
PMunkes commented 1 year ago

Radeon RX 7900 XTX Red Devil Silent Bios Stock settings, newest Windows 10 driver (Radeon Software 23.2.1) grafik grafik grafik

ProjectPhysX commented 1 year ago

@PMunkes thanks a lot for reporting! There seem to be rather significant performance improvements with the newer driver, so I've updated the values in the Readme table. I have now also fixed the incorrect TFLOPs reporting for 7900 series GPUs (RDNA3 is 256 ALUs per dual-CU).

Shmarvadon commented 1 year ago

2X Intel ARC A770

image

sachithdickwella commented 1 year ago

FP32-FP16S on RTX 2070 (Mobile)

image

btrinos commented 1 year ago

CPU: Intel Core i9 13900K OS: Microsoft Windows 11 GPU: nVidia Titan RTX Drivers: 531.61

FP32 [TFlops/s] 16.31 Mem [GB] 24 BW [GB/s] 527/571/577 FP32/FP32 [MLUPs/s] 3471 FP32/FP16S [MLUPs/s] 7456 FP32/FP16C [MLUPs/s] 7554 titanrtx-fp32-fp32 titanrtx-fp32-fp16s titanrtx-fp32-fp16c

btrinos commented 1 year ago

CPU: Intel Core i9 13900K OS: Microsoft Windows 11 GPU: nVidia Titan V Drivers: 531.61

FP32 [TFlops/s] 14.899 Mem [GB] 12 BW [GB/s] 549/558/534 FP32/FP32 [MLUPs/s] 3601 FP32/FP16S [MLUPs/s] 7253 FP32/FP16C [MLUPs/s] 6957 titanv-fp32-fp32 titanv-fp32-fp16s titanv-fp32-fp16c

Micmac2 commented 1 year ago

CPU : Apple M1 Max (24GPU) with 32GB OS : macOS Monterey 12.6.5

FP32/FP32

FluidX3D FP32 FP32

FP32/FP16S

FluidX3D FP32 FP16S

FP32/FP16C

FluidX3D FP32 FP16C
masazzz commented 1 year ago

System: Wisteria/BDEC-01 Aquarius, Supercomputing Division, Information Technology Center, The University of Tokyo (https://www.cc.u-tokyo.ac.jp/en/supercomputer/wisteria/system.php)

bench.sh

module load gcc/12.2.0
mkdir -p bin
git checkout src/defines.hpp
mv src/defines.hpp src/defines.hpp.orig
for a in FP32 FP16S FP16C
do
  (
    echo "#define $a"
    cat src/defines.hpp.orig
  ) > src/defines.hpp
  rm -f ./bin/FluidX3D
  g++ ./src/*.cpp -o ./bin/FluidX3D -std=c++17 -pthread -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL
  ./bin/FluidX3D | sed -E "s/\x1b\[([0-9]{1,3}((;[0-9]{1,3})*)?)?[mGK]//g" | col -bx | tee log.$a
done
bash bench.sh
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    8542 |   1307 GB/s |       509 |         9979  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 8543                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   15909 |   1225 GB/s |       948 |         9954  40% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 15917                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    8748 |    674 GB/s |       521 |         9993  30% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 8748                                                   |
masazzz commented 1 year ago

System: "Flow" Type II subsystem, Information Technology Center, Nagoya University (https://icts.nagoya-u.ac.jp/en/sc/)

bench.sh

module load gcc/10.3.0
mkdir -p bin
git checkout src/defines.hpp
mv src/defines.hpp src/defines.hpp.orig
for a in FP32 FP16S FP16C
do
  (
    echo "#define $a"
    cat src/defines.hpp.orig
  ) > src/defines.hpp
  rm -f ./bin/FluidX3D
  g++ ./src/*.cpp -o ./bin/FluidX3D -std=c++17 -pthread -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL
  ./bin/FluidX3D | sed -E "s/\x1b\[([0-9]{1,3}((;[0-9]{1,3})*)?)?[mGK]//g" | col -bx | tee log.$a
done
bash bench.sh
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    4468 |    684 GB/s |       266 |         9987  70% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 4474                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    8934 |    688 GB/s |       533 |         9993  30% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 8947                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    7205 |    555 GB/s |       429 |         9982  20% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 7217                                                   |
masazzz commented 1 year ago

System: Wisteria/BDEC-01 Aquarius, Supercomputing Division, Information Technology Center, The University of Tokyo (https://www.cc.u-tokyo.ac.jp/en/supercomputer/wisteria/system.php)

gcc/12.2.0 {FP32, FP16S, FP16C} {2GPUs, 4GPUs, 8GPUs}

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                512 x 256 x 256 = 33554432 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 544 MB, GPU 2x 1500 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   10516 |   1609 GB/s |       313 |         9991  10% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 10728                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                512 x 512 x 256 = 67108864 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                               CPU 1088 MB, GPU 4x 1513 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   15355 |   2349 GB/s |       229 |         9991 110% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 16116                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               512 x 512 x 512 = 134217728 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                               CPU 2176 MB, GPU 8x 1523 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 296 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   19286 |   2951 GB/s |       144 |         9999 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 21564                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                608 x 304 x 304 = 56188928 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                CPU 910 MB, GPU 2x 1482 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   18372 |   1415 GB/s |       327 |         9989 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 18810                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 304 = 112377856 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                               CPU 1821 MB, GPU 4x 1493 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   20989 |   1616 GB/s |       187 |         9997 170% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 28334                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 608 = 224755712 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                               CPU 3643 MB, GPU 8x 1503 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 351 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   39400 |   3034 GB/s |       175 |         9994 140% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 40628                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                608 x 304 x 304 = 56188928 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                CPU 910 MB, GPU 2x 1482 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   13071 |   1006 GB/s |       233 |         9989 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 13380                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 304 = 112377856 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                               CPU 1821 MB, GPU 4x 1493 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   21485 |   1654 GB/s |       191 |         9995 150% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 21584                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 608 = 224755712 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                               CPU 3643 MB, GPU 8x 1503 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 351 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   33048 |   2545 GB/s |       147 |         9995 150% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 33416                                                  |
masazzz commented 1 year ago

System: "Flow" Type II subsystem, Information Technology Center, Nagoya University (https://icts.nagoya-u.ac.jp/en/sc/)

gcc/10.3.0 {2GPUs, 4GPUs} {FP32, FP16S, FP16C}

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                512 x 256 x 256 = 33554432 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 544 MB, GPU 2x 1500 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    5465 |    836 GB/s |       163 |         9994 140% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 5776                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                608 x 304 x 304 = 56188928 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                CPU 910 MB, GPU 2x 1482 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   10859 |    836 GB/s |       193 |         9995 150% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 11427                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                608 x 304 x 304 = 56188928 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                CPU 910 MB, GPU 2x 1482 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   19556 |    736 GB/s |       170 |         9993 130% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 10018                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                512 x 512 x 256 = 67108864 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                               CPU 1088 MB, GPU 4x 1513 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    7630 |   1167 GB/s |       114 |         9999 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 7792                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 304 = 112377856 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                               CPU 1821 MB, GPU 4x 1493 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   15383 |   1185 GB/s |       137 |         9995 150% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 16682                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 304 = 112377856 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                               CPU 1821 MB, GPU 4x 1493 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   14346 |   1105 GB/s |       128 |         9998 180% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 14567                                                  |
ProjectPhysX commented 1 year ago

@masazzz thank you very much, this is amazing hardware! While for single-GPU the performance is mostly independent of resolution, for multi-GPU it makes a bigger difference, and especially the high resolutions are of interest. At low resolution, the domain communication overhead is more significant compared to the computation of the domains themselves. So I'd expect quite a bit of improvement at larger resolution.

Feels too much to ask for, but would you mind benchmarking the 2x/4x/8x GPU configurations with "memory=39800u;" for the A100's and "memory=31800u;" for the V100's? At this higher resolution, you can set the loop iterations from 1000 to 80 so it doesn't take that long.

Thank you so much!!

masazzz commented 1 year ago

@ProjectPhysX Thank you for your suggestion. I'll try the A100 later as well.

System: "Flow" Type II subsystem, Information Technology Center, Nagoya University (https://icts.nagoya-u.ac.jp/en/sc/) Compiler: gcc/10.3.0 Memory: 31800 Loop iterations: 80 GPUs: 2, 4 FP: FP32, FP16S, FP16C

bench.sh

#!/bin/bash
module load gcc/10.3.0
mkdir -p bin
git checkout src/defines.hpp
mv src/defines.hpp src/defines.hpp.orig
git checkout src/setup.cpp
mv src/setup.cpp src/setup.cpp.orig
GPUs=1
array=("LBM lbm(2u*L, 1u*L, 1u*L, 2u, 1u, 1u, 1.0f);" "LBM lbm(2u*L, 2u*L, 1u*L, 2u, 2u, 1u, 1.0f);")
for i in ${!array[@]}
do
  sed -e "s|for(uint i=0u; i<1000u; i++) {|for(uint i=0u; i<80u; i++) {|" -e "s|LBM lbm(256u, 256u, 256u, 1.0f);|const uint memory = 31800u;const uint L = ((uint)cbrt(fmin((float)memory*1048576.0f/(19.0f*(float)sizeof(fpxx)+17.0f), (float)max_uint))/2u)*2u;${array[$i]}|" src/setup.cpp.orig > src/setup.cpp
  GPUs=$((GPUs * 2))
  for a in FP32 FP16S FP16C
  do
    (
      echo "#define $a"
      cat src/defines.hpp.orig
    ) > src/defines.hpp
    rm -f ./bin/FluidX3D
    g++ ./src/*.cpp -o ./bin/FluidX3D -std=c++17 -pthread -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL
    ./bin/FluidX3D | sed -E "s/\x1b\[([0-9]{1,3}((;[0-9]{1,3})*)?)?[mGK]//g" | col -bx | tee log.memory_31800.${GPUs}GPUs.$a
  done
done
bash bench.ch
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                              1420 x 710 x 710 = 715822000 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                             CPU 11605 MB, GPU 2x 31850 MB |
| Max Alloc Size  |                                                  25941 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 410 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    7788 |   1192 GB/s |        11 |          800 100% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 7953                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                             1692 x 846 x 846 = 1210991472 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                             CPU 19633 MB, GPU 2x 31854 MB |
| Max Alloc Size  |                                                  21942 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 488 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   15377 |   1184 GB/s |        13 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 15469                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                             1692 x 846 x 846 = 1210991472 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                             CPU 19633 MB, GPU 2x 31854 MB |
| Max Alloc Size  |                                                  21942 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 488 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   12842 |   1989 GB/s |        11 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 12932                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1420 x 1420 x 710 = 1431644000 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                             CPU 23210 MB, GPU 4x 31940 MB |
| Max Alloc Size  |                                                  25941 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 410 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   13002 |   1989 GB/s |         9 |          800 100% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 13135                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1692 x 1692 x 846 = 2421982944 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                             CPU 39266 MB, GPU 4x 31930 MB |
| Max Alloc Size  |                                                  21942 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 488 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   26044 |   2005 GB/s |        11 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 26527                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1692 x 1692 x 846 = 2421982944 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                             CPU 39266 MB, GPU 4x 31930 MB |
| Max Alloc Size  |                                                  21942 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 488 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   22500 |   1733 GB/s |        19 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 22686                                                  |
masazzz commented 1 year ago

Here are the A100 benchmark test results. Let me know if there is anything else I can do.

System: Wisteria/BDEC-01 Aquarius, Supercomputing Division, Information Technology Center, The University of Tokyo (https://www.cc.u-tokyo.ac.jp/en/supercomputer/wisteria/system.php)

Compiler: gcc/12.2.0 Memory: 39800 Loop iterations: 80 GPUs: 2, 4, 8 FP: FP32, FP16S, FP16C

bench-Aquarius.sh

module load gcc/12.2.0
mkdir -p bin
git checkout src/defines.hpp
mv src/defines.hpp src/defines.hpp.orig
git checkout src/setup.cpp
mv src/setup.cpp src/setup.cpp.orig
GPUs=1
array=("LBM lbm(2u*L, 1u*L, 1u*L, 2u, 1u, 1u, 1.0f);" "LBM lbm(2u*L, 2u*L, 1u*L, 2u, 2u, 1u, 1.0f);" "LBM lbm(2u*L, 2u*L, 2u*L, 2u, 2u, 2u, 1.0f);")
for i in ${!array[@]}
do
  sed -e "s|for(uint i=0u; i<1000u; i++) {|for(uint i=0u; i<80u; i++) {|" -e "s|LBM lbm(256u, 256u, 256u, 1.0f);|const uint memory = 39800u;const uint L = ((uint)cbrt(fmin((float)memory*1048576.0f/(19.0f*(float)sizeof(fpxx)+17.0f), (float)max_uint))/2u)*2u;${array[$i]}|" src/setup.cpp.orig > src/setup.cpp
  GPUs=$((GPUs * 2))
  for a in FP32 FP16S FP16C
  do
    log=log.memory_39800.${GPUs}GPUs.$a
    if [ ! -f $log ]
    then
        (
            echo "#define $a"
            cat src/defines.hpp.orig
        ) > src/defines.hpp
        rm -f ./bin/FluidX3D
        g++ ./src/*.cpp -o ./bin/FluidX3D -std=c++17 -pthread -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL
        ./bin/FluidX3D | sed -E "s/\x1b\[([0-9]{1,3}((;[0-9]{1,3})*)?)?[mGK]//g" | col -bx | tee $log
    fi
  done
done
bash bench-Aquarius.sh
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                              1528 x 764 x 764 = 891887488 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                             CPU 14459 MB, GPU 2x 39675 MB |
| Max Alloc Size  |                                                  32321 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 441 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   14183 |   2170 GB/s |        16 |          800 100% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 14311                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                             1824 x 912 x 912 = 1517101056 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                             CPU 24595 MB, GPU 2x 39897 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 527 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   23472 |   1807 GB/s |        15 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 23707                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                             1824 x 912 x 912 = 1517101056 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                             CPU 24595 MB, GPU 2x 39897 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 527 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   15518 |   1195 GB/s |        10 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 15512                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1528 x 1528 x 764 = 1783774976 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                             CPU 28919 MB, GPU 4x 39780 MB |
| Max Alloc Size  |                                                  32321 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 441 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   23706 |   3627 GB/s |        13 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 23411                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1824 x 1824 x 912 = 3034202112 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                             CPU 49191 MB, GPU 4x 39987 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 527 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   41877 |   3225 GB/s |        14 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 42400                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1824 x 1824 x 912 = 3034202112 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                             CPU 49191 MB, GPU 4x 39987 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 527 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   28813 |   2219 GB/s |        19 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 29017                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                           1528 x 1528 x 1528 = 3567549952 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                             CPU 57838 MB, GPU 8x 39883 MB |
| Max Alloc Size  |                                                  32321 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 882 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   36708 |   5616 GB/s |        10 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 37619                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                           1824 x 1824 x 1824 = 6068404224 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                             CPU 98383 MB, GPU 8x 40074 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                 Re < 1053 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   72707 |   5598 GB/s |        12 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 72965                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                           1824 x 1824 x 1824 = 6068404224 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                             CPU 98383 MB, GPU 8x 40074 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                 Re < 1053 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   62451 |   4809 GB/s |        10 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 63009                                                  |
rodionstepanov commented 1 year ago

I'm surprise to get considerably different result for the same Tesla V100-SXM2-32GB 2GPU |----------------.------------------------------------------------------------| | Device ID 0 | Tesla V100-SXM2-32GB | | Device ID 1 | Tesla V100-SXM2-32GB | |----------------'------------------------------------------------------------|

|----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | Tesla V100-SXM2-32GB | | Device Vendor | NVIDIA Corporation | | Device Driver | 450.51.05 | | OpenCL Version | OpenCL C 1.2 | | Compute Units | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s) | | Memory, Cache | 32510 MB, 2560 KB global / 48 KB local | | Buffer Limits | 8127 MB global, 64 KB constant | |----------------'------------------------------------------------------------|

| Info: OpenCL C code successfully compiled. |

|----------------.------------------------------------------------------------| | Device ID | 1 | | Device Name | Tesla V100-SXM2-32GB | | Device Vendor | NVIDIA Corporation | | Device Driver | 450.51.05 | | OpenCL Version | OpenCL C 1.2 | | Compute Units | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s) | | Memory, Cache | 32510 MB, 2560 KB global / 48 KB local | | Buffer Limits | 8127 MB global, 64 KB constant | |----------------'------------------------------------------------------------|

| Info: OpenCL C code successfully compiled. | |-----------------.-----------------------------------------------------------| | Grid Resolution | 1420 x 710 x 710 = 715822000 | | Grid Domains | 2 x 1 x 1 = 2 | | LBM Type | D3Q19 SRT (FP32/FP32) | | Memory Usage | CPU 11605 MB, GPU 2x 31850 MB | | Max Alloc Size | 25941 MB | | Time Steps | 10 | | Kin. Viscosity | 1.00000000 | | Relaxation Time | 3.50000000 | | Reynolds Number | Re < 410 | |---------.-------'-----.-----------.-------------------.---------------------| | MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining | | 8531 | 1305 GB/s | 12 | 999 90% | 0s | |---------'-------------'-----------'-------------------'---------------------|

| Info: Peak MLUPs/s = 8528 |

ProjectPhysX commented 1 year ago

@rodionstepanov I've noticed this as well. It's quite surprising that there sometimes is considerable differences even between identical GPUs. Depending on the silicon lottery, some GPU/memory chips may boost higher than others. Different CPU/mainboard/PCIe-interconnect/cooling/drivers may also affect results.

lgmnrx commented 1 year ago

Hello, this benchmark was performed on a RX 6700M(130W), the card was set on extreme performance. Still having doubts about the bandwidth used during the benchmark as the card has a 320GB/s bandwidth. Let me know if i should redo the testing. fp32 fp16c fp16s

ProjectPhysX commented 1 year ago

@lgmnrx thank you very much! ~60% efficiency is typical for RDNA 1/2/3 GPUs. All good!

illwieckz commented 1 year ago

GPU: AMD Radeon R9 390X Grenada XT (GCN 2.0), here labelled as Hawaii (series, device). Driver: AMD Orca 21.20-1271047, APP 3224.4

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Hawaii                                                     |
| Device ID    1 | Oland                                                      |
| Device ID    2 | pthread-AMD Ryzen Threadripper PRO 3955WX 16-Cores         |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Hawaii                                                     |
| Device Vendor  | Advanced Micro Devices, Inc.                               |
| Device Driver  | 3224.4                                                     |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 44 at 1080 MHz (2816 cores, 6.083 TFLOPs/s)                |
| Memory, Cache  | 7418 MB, 16 KB global / 32 KB local                        |
| Buffer Limits  | 4048 MB global, 4145152 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1703 |    261 GB/s |       101 |         9998  80% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1733                                                   |
ProjectPhysX commented 1 year ago

@illwieckz awesome, 512-bit memory bus FTW! Could you provide the FP16S/FP16C benchmarks as well please?

illwieckz commented 1 year ago

FP16S

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Hawaii                                                     |
| Device ID    1 | Oland                                                      |
| Device ID    2 | pthread-AMD Ryzen Threadripper PRO 3955WX 16-Cores         |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Hawaii                                                     |
| Device Vendor  | Advanced Micro Devices, Inc.                               |
| Device Driver  | 3224.4                                                     |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 44 at 1080 MHz (2816 cores, 6.083 TFLOPs/s)                |
| Memory, Cache  | 7661 MB, 16 KB global / 32 KB local                        |
| Buffer Limits  | 4048 MB global, 4145152 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    2137 |    165 GB/s |       127 |         9998  80% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 2217                                                   |

FP16C

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Hawaii                                                     |
| Device ID    1 | Oland                                                      |
| Device ID    2 | pthread-AMD Ryzen Threadripper PRO 3955WX 16-Cores         |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Hawaii                                                     |
| Device Vendor  | Advanced Micro Devices, Inc.                               |
| Device Driver  | 3224.4                                                     |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 44 at 1080 MHz (2816 cores, 6.083 TFLOPs/s)                |
| Memory, Cache  | 7656 MB, 16 KB global / 32 KB local                        |
| Buffer Limits  | 4048 MB global, 4145152 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1668 |    128 GB/s |        99 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1722                                                   |
gryoung4727 commented 1 year ago

GTX Titan FP32/FP32 Titan-FP32-FP32

GTX Titan FP32/FP16S Titan-FP32-FP16S

GTX Titan FP32/FP16C Titan-FP32-FP16C

GTX 680 FP32/FP32 680-FP32-FP32

GTX 680 FP32/FP16S 680-FP32-FP16S

GTX 680 FP32/FP16C 680-FP32-FP16C

dextorious commented 1 year ago

Apple M2 Max (in the 16" chassis):

| Device ID    0 | Apple M2 Max                                               |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Apple M2 Max                                               |
| Device Vendor  | Apple                                                      |
| Device Driver  | 1.2 1.0                                                    |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 38 at 1000 MHz (4864 cores, 9.728 TFLOPs/s)                |
| Memory, Cache  | 21845 MB, 0 KB global / 32 KB local                        |
| Buffer Limits  | 4096 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    2398 |    367 GB/s |       143 |         9995  50% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 2405                                                   |

| Device ID    0 | Apple M2 Max                                               |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Apple M2 Max                                               |
| Device Vendor  | Apple                                                      |
| Device Driver  | 1.2 1.0                                                    |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 38 at 1000 MHz (4864 cores, 9.728 TFLOPs/s)                |
| Memory, Cache  | 21845 MB, 0 KB global / 32 KB local                        |
| Buffer Limits  | 4096 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    4613 |    355 GB/s |       275 |         9985  50% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 4641                                                   |

| Device ID    0 | Apple M2 Max                                               |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Apple M2 Max                                               |
| Device Vendor  | Apple                                                      |
| Device Driver  | 1.2 1.0                                                    |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 38 at 1000 MHz (4864 cores, 9.728 TFLOPs/s)                |
| Memory, Cache  | 21845 MB, 0 KB global / 32 KB local                        |
| Buffer Limits  | 4096 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    2422 |    187 GB/s |       144 |         9994  40% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 2444                                                   |

Pretty good overall, but I did find the FP16C efficiency drop rather surprising. There was no throttling during the benchmark (in fact, the fan didn't even turn on, the temps gradually reached 75C and that's it) and I ran it twice in different order, but this system doesn't like FP16C. Otherwise really happy to see over 90% efficiency over what is really a shared memory interface on a system with plenty of background tasks, etc.

JS-DevX commented 1 year ago

GeForce GTX 770: Benchmark.txt

PMunkes commented 1 year ago

RX 7600, Stock. Theoretical memory bandwidth is 288GB/s: FP32: grafik FP16S: grafik FP16C: grafik

jpecar commented 1 year ago

IMHO we should be really looking at perf/W. Some of my numbers I can generate, all on D3Q19 and FP16S:

2080Ti ... Peak MLUPs/s = 2291, power fluctuates 165W..246W 3090 ... Peak MLUPs/s = 10635, power around 345W ... 30.8 MLUPs/W A40 ... Peak MLUPs/s = 6821, at 215W ... 31.7 MLUPs/W Mi210 ... Peak MLUPs/s = 7199, at 185W ... 38.9 MLUPs/W A100 ... Peak MLUPs/s = 16203, at 217W ... 74.6 MLUPs/W H100 ... Peak MLUPs/s = 20339, at 247W ... 82.3 MLUPs/W

Does anyone have L40 to test?

Perf/$ should also be interesting ... I expect Radeon VII & Mi50 to be at the top.

ProjectPhysX commented 1 year ago

@jpecar indeed, that is a good metric. Here is some useful charts of all benchmarked hardware so far:

Performance [MLUPs/s] image

Memory efficiency (roofline model) [%] image

Performance per Watt [MLUPs/s / W] image

Performance per $ (launch price) [MLUPs/s / $] image

Value [MLUPs/s memory capacity / (W $)] image

starfire24680 commented 1 year ago

AMD Instinct MI100: image image image

Amd Pro W6800: image image image

skittles-fivem commented 1 year ago

https://www.techpowerup.com/gpu-specs/msi-rtx-4070-ventus-3x-oc.b11046

aaa
PMunkes commented 1 year ago

AMD Phoenix with DDR5-6400 memory in the ROG Ally: grafik grafik grafik

ProjectPhysX commented 1 year ago

@skittles-fivem thank you for the 4070 FP16C benchmark! Could you also post the FP32 and FP16S benchmarks please? Thanks!!

skittles-fivem commented 1 year ago

@skittles-fivem thank you for the 4070 FP16C benchmark! Could you also post the FP32 and FP16S benchmarks please? Thanks!!

Capture22 22
marty1885 commented 1 year ago

On Orange Pi 5 Plus (RK3588/Mali G610 MP4) 16GB

❯ ./make.sh # F32
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.7 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
|----------------.------------------------------------------------------------|
| Device ID    0 | Mali-LODX r0p0                                             |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Mali-LODX r0p0                                             |
| Device Vendor  | ARM                                                        |
| Device Driver  | 2.1                                                        |
| OpenCL Version | OpenCL C 2.0 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd |
| Compute Units  | 4 at 1000 MHz (32 cores, 0.064 TFLOPs/s)                   |
| Memory, Cache  | 15708 MB, 1024 KB global / 32 KB local                     |
| Buffer Limits  | 15708 MB global, 16085876 KB constant                      |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|      43 |      7 GB/s |         3 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 43                                                     |

❯ ./make.sh # F16S
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.7 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
|----------------.------------------------------------------------------------|
| Device ID    0 | Mali-LODX r0p0                                             |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Mali-LODX r0p0                                             |
| Device Vendor  | ARM                                                        |
| Device Driver  | 2.1                                                        |
| OpenCL Version | OpenCL C 2.0 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd |
| Compute Units  | 4 at 1000 MHz (32 cores, 0.064 TFLOPs/s)                   |
| Memory, Cache  | 15708 MB, 1024 KB global / 32 KB local                     |
| Buffer Limits  | 15708 MB global, 16085876 KB constant                      |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|      59 |      5 GB/s |         4 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 59                                                     |

❯ ./make.sh # F16C
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.7 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
|----------------.------------------------------------------------------------|
| Device ID    0 | Mali-LODX r0p0                                             |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Mali-LODX r0p0                                             |
| Device Vendor  | ARM                                                        |
| Device Driver  | 2.1                                                        |
| OpenCL Version | OpenCL C 2.0 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd |
| Compute Units  | 4 at 1000 MHz (32 cores, 0.064 TFLOPs/s)                   |
| Memory, Cache  | 15708 MB, 1024 KB global / 32 KB local                     |
| Buffer Limits  | 15708 MB global, 16085876 KB constant                      |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|      19 |      1 GB/s |         1 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 19                                                     |
bochen2027 commented 1 year ago

2023-06-25 14_43_02 2023-06-25 14_43_29 2023-06-25 14_44_13

anyway to use resizeable bar to support the usage of system memory?

aschillingHWL commented 1 year ago

All three benchmarks with the AMD Radeon PRO W7900

RadeonProW7900-FluidX3D-FP32-FP16S RadeonProW7900-FluidX3D-FP32-FP16C RadeonProW7900-FluidX3D-FP32-FP32
aschillingHWL commented 1 year ago

AMD Radeon PRO W7800

RadeonProW7800-FluidX3D-FP32-FP16C RadeonProW7800-FluidX3D-FP32-FP16S RadeonProW7800-FluidX3D-FP32-FP32
aschillingHWL commented 1 year ago

AMD RTX 6000 Ada Generation

RTX6000-Ada-FluidX3D-FP32-FP16C RRTX6000-Ada-FluidX3D-FP32-FP16S RTX6000-Ada-FluidX3D-FP32-FP32
Derakoptes commented 1 year ago

RTX 3050M, 60WATTS TDP

Screenshot 2023-07-07 093736

image

image

HapppyLance commented 1 year ago

AMD RX6800M Screenshot 2023-07-10 205458 Screenshot 2023-07-10 205652 Screenshot 2023-07-10 210005