ProjectPhysX / FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. Free for non-commercial use.
https://youtube.com/@ProjectPhysX
Other
3.77k stars 300 forks source link

Misdetection of VRAM capacity on old AMD GPU, any way to override? #76

Closed illwieckz closed 10 months ago

illwieckz commented 1 year ago

I tried running the software on an AMD Radeon R9 390X (GCN 2.0, Grenada XT, here listed as Hawaii as Grenada XT is just an Hawaii variant) using the AMD Orca OpenCL driver on Linux.

That card has 8GB or VRAM, so the “too large: 1x 1488 MB required, 1x 640 MB available” message looks wrong.

Either FluidX3D has a bug, either the driver has a bug. This driver will never be updated by AMD anymore but it is actually the latest official Linux AMD driver for the GCN 2.0 hardware generation, so if that's a driver bug, it would be nice to let FluidX3D ignore that error and continue anyway. Is there such option?

+ ./bin/FluidX3D 0
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Hawaii                                                     |
| Device ID    1 | Oland                                                      |
| Device ID    2 | pthread-AMD Ryzen Threadripper PRO 3955WX 16-Cores         |
|----------------'------------------------------------------------------------|
| Error: Grid resolution (256, 256, 256) is too large: 1x 1488 MB required,   |
|        1x 640 MB available. Largest possible resolution is (193, 193, 193). |
|        Restart the simulation with lower resolution or on different         |
|        device(s) with more memory. Consider using FP16S/FP16C memory        |
|        compression to double maximum grid resolution to a maximum of (230,  |
|        230, 230); for this, uncomment "#define FP16S" or "#define FP16C" in |
|        defines.hpp.                                                         |
'-----------------------------------------------------------------------------'
ProjectPhysX commented 1 year ago

Hi @illwieckz,

I think that the TFLOPs/s estimate for the 390X is incorrect, in which case the auto-selection of the fastest GPU will fail and it might execute FluidX3D on the integrated/chipset Oland GPU with 640MB by defaul.

To manually select the 390X, over the command line run FluidX3D.exe 0 (Windows) or bin/FluidX3D 0 (Linux). The "0" indicates the device ID for the Hawaii GPU.

To help me fix the wrong TFLOPs/s estimate, could you please upload your hardware to the OpenCL device database? https://opencl.gpuinfo.org/download.php

Regards, Moritz

illwieckz commented 1 year ago

The test was done with ./bin/FluidX3D 0, in order to only use the Grenada XT/Hawaii card (R9 390X).

The listed Oland one isn't an integrated one but a very cheap one indeed (R7 240), and I was not trying to run the Benchmark on it but on the other one.

If I do ./bin/FluidX32 2 to select the PoCL CPU device it works.

What is good to know is that both the Grenada XT/Hawaii and the Oland are listed by the same Orca driver, so it seems to properly select the platform, but maybe not the device within the platform.

ProjectPhysX commented 1 year ago

Hi @illwieckz,

then the driver reports the wrong VRAM capacity.

You can disable the VRAM checks in the code by commenting out these two lines with //: https://github.com/ProjectPhysX/FluidX3D/blob/master/src/opencl.hpp#L223 https://github.com/ProjectPhysX/FluidX3D/blob/master/src/lbm.cpp#L619

Let me know if this works.

Regards, Moritz

illwieckz commented 1 year ago

It now works despite the code has not changed. But I rebooted in between, so this may have been a temporary driver bug.

I can't reproduce the bug anymore so I can't investigate the bug much, so I close this.

illwieckz commented 1 year ago

I reproduce the bug!

It happens when I build without X11 (default built):

g++ ./src/*.cpp -o ./bin/FluidX3D -std=c++17 -pthread -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL

It doesn't happen when I build with X11:

g++ ./src/*.cpp -o ./bin/FluidX3D -std=c++17 -pthread -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL \
 -I./src/X11/include -L./src/X11/lib -lX11

Something I know is that the Orca driver needs X11 for some reasons (for example I have to set DISPLAY environment variable to an existing working display if I'm doing some OpenCL tasks from SSH, for example).

illwieckz commented 1 year ago

Commenting out the error messages just led to a computer crash. 😅️

So those error messages are actually guarding from some real error.

I don't know why this only works when built with X11, but it would be worth mentioning some OpenCL drivers on Linux may require the tool to be built with X11 to work. This is verified for AMD Orca, and I strongly suspect AMD PAL would behave the same (I have to check), as I also know AMD PAL requires a working X11 display so maybe it hits the same bug.

ProjectPhysX commented 1 year ago

Thank you for the update! This is super weird. I'll report this bug to AMD, but since it's a legacy GPU I think they won't put any resources into it.

illwieckz commented 1 year ago

AMD Orca is now deprecated and they will not update it so I'll doubt any report will change anything unfortunately… But that's still their latest driver for this range of hardware so we have to live with it. 🙂️

So what can be done is to tell user (like in README and in make.sh) that some legacy AMD drivers may require to build to build with X11 enabled. Maybe it would be possible to write a driver detection code and if driver is orca and the tool is not built with x11, print a meaningful message that bugs may be seen, but that may be too much. A mention in documentation would probably be enough.