ProjectPhysX / FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. Free for non-commercial use.
https://youtube.com/@ProjectPhysX
Other
3.77k stars 300 forks source link

Linux gcc-8.5 compiler error and slurm memory problem #82

Closed rodionstepanov closed 9 months ago

rodionstepanov commented 1 year ago

When trying to compile on Linux (without X11) below errors appear. What is wrong?

/tmp/ccDlRpNU.o: In function std::filesystem::exists(std::filesystem::__cxx11::path const&)': lbm.cpp:(.text._ZNSt10filesystem6existsERKNS_7__cxx114pathE[_ZNSt10filesystem6existsERKNS_7__cxx114pathE]+0x14): undefined reference tostd::filesystem::status(std::filesystem::cxx11::path const&)' /tmp/ccDlRpNU.o: In function `std::filesystem::is_directory(std::filesystem::__cxx11::path const&)': lbm.cpp:(.text._ZNSt10filesystem12is_directoryERKNS_7cxx114pathE[_ZNSt10filesystem12is_directoryERKNS_7cxx114pathE]+0x14): undefined reference to `std::filesystem::status(std::filesystem::cxx11::path const&)' /tmp/ccDlRpNU.o: In function create_folder(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)': lbm.cpp:(.text._Z13create_folderRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE[_Z13create_folderRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE]+0x13e): undefined reference tostd::filesystem::create_directories(std::filesystem::cxx11::path const&)' /tmp/ccDlRpNU.o: In function `std::filesystem::cxx11::path::path<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::filesystem::cxx11::path>(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::filesystem::cxx11::path::format)': lbm.cpp:(.text._ZNSt10filesystem7cxx114pathC2INSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEES1_EERKT_NS1_6formatE[_ZNSt10filesystem7cxx114pathC5INSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEES1_EERKT_NS1_6formatE]+0x64): undefined reference to `std::filesystem::__cxx11::path::_M_split_cmpts()' collect2: error: ld returned 1 exit status

ProjectPhysX commented 1 year ago

Hi @rodionstepanov,

it seems you have an older version of the gcc compiler that does not yet support C++17. You need at least gcc version 9; check with gcc --version.

Possible solutions:

  1. Sometimes you have to manually select a gcc version, so in make.sh instead of g++, write g++-9 or g++-11.
  2. On some machines, the latest version of gcc has to be loaded with module load gcc.
  3. Disable C++-17 functionality in FluidX3D: Uncomment utilities.hpp line 10: #define UTILITIES_NO_CPP17. Then it should compile also with an older version of gcc. The only C++17 function that otherwise is used is to automatically create missing directories for file export with std::filesystem. If this is deactivated, all directories where files are exported have to be manually created beforehand, or otherwise files will not be written to the hard drive and the data is lost.

Please let me know if this works.

Kind regards, Moritz

rodionstepanov commented 1 year ago

thank you @ProjectPhysX! Indeed there is only gcc (GCC) 8.5.0 I uncommented #define UTILITIES_NO_CPP17. The code then compiles successfully. I've got a strange behavior in memory allocation. Is it because of old compiler? Result for run on single GPU with const uint memory = 2088u; .-----------------------------------------------------------------------------. | __ __ | | \ ____ | | ____ / | | \ \ | | | | / / | | \ \ | | | | / / | | \ \ | | | | / / | | \ _.-" | | "-./ / | | \ .-" "-. / | | .-" .-" "-. "-./ | | .-" .-"-. "-. | | \ v" "v / | | \ \ / / | | \ \ / / | | \ \ / / | | \ ' / | | \ / | | \ / FluidX3D Version 2.6 | | ' Copyright (c) Moritz Lehmann | |----------------.------------------------------------------------------------| | Device ID 0 | Tesla M2090 | |----------------'------------------------------------------------------------| |----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | Tesla M2090 | | Device Vendor | NVIDIA Corporation | | Device Driver | 390.132 | | OpenCL Version | OpenCL C 1.1 | | Compute Units | 16 at 1301 MHz (2048 cores, 5.329 TFLOPs/s) | | Memory, Cache | 6067 MB, 256 KB global / 48 KB local | | Buffer Limits | 1516 MB global, 64 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | |-----------------.-----------------------------------------------------------| | Grid Resolution | 286 x 286 x 286 = 23393656 | | Grid Domains | 1 x 1 x 1 = 1 | | LBM Type | D3Q19 SRT (FP32/FP32) | | Memory Usage | CPU 379 MB, GPU 1x 2073 MB | | Max Alloc Size | 1695 MB | | Time Steps | 10 | | Kin. Viscosity | 1.00000000 | | Relaxation Time | 3.50000000 | | Reynolds Number | Re < 165 | |---------.-------'-----.-----------.-------------------.---------------------| | MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining | | 832 | 127 GB/s | 36 | 998 80% | 0s | |---------'-------------'-----------'-------------------'---------------------| | Info: Peak MLUPs/s = 832

Result for two GPUs

.-----------------------------------------------------------------------------. | __ __ | | \ ____ | | ____ / | | \ \ | | | | / / | | \ \ | | | | / / | | \ \ | | | | / / | | \ _.-" | | "-./ / | | \ .-" "-. / | | .-" .-" "-. "-./ | | .-" .-"-. "-. | | \ v" "v / | | \ \ / / | | \ \ / / | | \ \ / / | | \ ' / | | \ / | | \ / FluidX3D Version 2.6 | | ' Copyright (c) Moritz Lehmann | |----------------.------------------------------------------------------------| | Device ID 0 | Tesla M2090 | | Device ID 1 | Tesla M2090 | |----------------'------------------------------------------------------------| |----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | Tesla M2090 | | Device Vendor | NVIDIA Corporation | | Device Driver | 390.132 | | OpenCL Version | OpenCL C 1.1 | | Compute Units | 16 at 1301 MHz (2048 cores, 5.329 TFLOPs/s) | | Memory, Cache | 6067 MB, 256 KB global / 48 KB local | | Buffer Limits | 1516 MB global, 64 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | |----------------.------------------------------------------------------------| | Device ID | 1 | | Device Name | Tesla M2090 | | Device Vendor | NVIDIA Corporation | | Device Driver | 390.132 | | OpenCL Version | OpenCL C 1.1 | | Compute Units | 16 at 1301 MHz (2048 cores, 5.329 TFLOPs/s) | | Memory, Cache | 6067 MB, 256 KB global / 48 KB local | | Buffer Limits | 1516 MB global, 64 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | |-----------------.-----------------------------------------------------------| | Grid Resolution | 572 x 286 x 286 = 46787312 | | Grid Domains | 2 x 1 x 1 = 2 | | LBM Type | D3Q19 SRT (FP32/FP32) | | Memory Usage | CPU 758 MB, GPU 2x 2089 MB | | Max Alloc Size | 1695 MB | | Time Steps | 10 | | Kin. Viscosity | 1.00000000 | | Relaxation Time | 3.50000000 | | Reynolds Number | Re < 165 | |---------.-------'-----.-----------.-------------------.---------------------| | MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining | | 1078 | 165 GB/s | 23 | 1000 100% | 0s | |---------'-------------'-----------'-------------------'---------------------| | Info: Peak MLUPs/s = 1104 |

Run for 4 GPUs was failed .-----------------------------------------------------------------------------. | __ __ | | \ ____ | | ____ / | | \ \ | | | | / / | | \ \ | | | | / / | | \ \ | | | | / / | | \ _.-" | | "-./ / | | \ .-" "-. / | | .-" .-" "-. "-./ | | .-" .-"-. "-. | | \ v" "v / | | \ \ / / | | \ \ / / | | \ \ / / | | \ ' / | | \ / | | \ / FluidX3D Version 2.6 | | ' Copyright (c) Moritz Lehmann | |----------------.------------------------------------------------------------| | Device ID 0 | Tesla M2090 | | Device ID 1 | Tesla M2090 | | Device ID 2 | Tesla M2090 | | Device ID 3 | Tesla M2090 | |----------------'------------------------------------------------------------| |----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | Tesla M2090 | | Device Vendor | NVIDIA Corporation | | Device Driver | 390.132 | | OpenCL Version | OpenCL C 1.1 | | Compute Units | 16 at 1301 MHz (2048 cores, 5.329 TFLOPs/s) | | Memory, Cache | 6067 MB, 256 KB global / 48 KB local | | Buffer Limits | 1516 MB global, 64 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | |----------------.------------------------------------------------------------| | Device ID | 1 | | Device Name | Tesla M2090 | | Device Vendor | NVIDIA Corporation | | Device Driver | 390.132 | | OpenCL Version | OpenCL C 1.1 | | Compute Units | 16 at 1301 MHz (2048 cores, 5.329 TFLOPs/s) | | Memory, Cache | 6067 MB, 256 KB global / 48 KB local | | Buffer Limits | 1516 MB global, 64 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | |----------------.------------------------------------------------------------| | Device ID | 2 | | Devicslurmstepd: error: Detected 1 oom-kill event(s) in step 16316445.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. srun: error: tesla43: task 0: Out Of Memory

ProjectPhysX commented 1 year ago

Wow you have some awesome old hardware! The oom-kill event means that Linux did run out of memory (oom) and had to kill the process in order to keep the operating system running. Do you have enough CPU RAM installed?

rodionstepanov commented 1 year ago

yes, rather old but why not have a fun :) This node has: Two 8-core Intel® Xeon® E5-2660 (2.2 GHz) processors main memory 96 GB
2 x 20 MB cache Level 2 cache 8 Tesla M2090 GPUs (6GB Global Memory)

Do you have enough CPU RAM installed?

@ProjectPhysX I think, yes, I do. Since the run on 1 GPU didn't crush. But for 4 GPU means, total memory 4*memory, exceeds 6GB. I don't understand. Is it strange memory allocation because of old compiler?

ProjectPhysX commented 10 months ago

Hi @rodionstepanov,

I think this an issue with slurm limiting the amount of CPU memory that the application can occupy. Add the option --mem-per-cpu, or --mem (per node) to make slurm request more memory for the job. Let me know it this works!

Kind regards, Moritz