Chia-Network / bladebit

A high-performance k32-only, Chia (XCH) plotter supporting in-RAM and disk-based plotting
Apache License 2.0
339 stars 109 forks source link

bladebit cudaplot refuses to write in to ramdisk #375

Closed hajes closed 10 months ago

hajes commented 10 months ago

I have a dual Xeon rig with 512 GB RAM connected with QPI link 8GTs/16GBs My old officially worn Corsair MP 600 2TB manages to do about 1.3 GBs

numa 0 is free, and unused. numa 1 contains GPU + SSD/NVMe in an adapter connected via PCIe 3.0 16x

The theory was to create a ramdisk on numa 0 as tmp storage before it is moved to LVM stripped JBOD that manages 500-1000MBs. RAM is faster and moving between numa nodes will be faster than writting to ssd/nvme

The last time I tried was on bladebit alpha4 version, bladebit always crashes with error unable to write to destination.

Do you have some sort of ramdisk lock or something like that?

harold-b commented 10 months ago

You have to disable direct I/O on output. I am not sure if the value is currently exposed to CLI. In the meantime you can replace this line:

from:

cx.plotWriter = new PlotWriter( !cfg.gCfg->disableOutputDirectIO );

to:

cx.plotWriter = new PlotWriter( false );

And rebuild

hajes commented 10 months ago

thanks for info. Unfortunately, I am not able to build from source because your cmake config cannot recognize Clear Linux version of CUDA

so far I didn't figure out how to override it.

Found CUDA: true
NVCC      : /opt/cuda/bin/nvcc
CMake Error at /usr/share/cmake-3.27/Modules/CMakeDetermineCompilerId.cmake:756 (message):
  Compiling the CUDA compiler identification source file
  "CMakeCUDACompilerId.cu" failed.

  Compiler: /opt/cuda/bin/nvcc

  Build flags:

  Id flags: --keep;--keep-dir;tmp -v

  The output was:

  1

  #$ _NVVM_BRANCH_=nvvm

  #$ _SPACE_=

  #$ _CUDART_=cudart

  #$ _HERE_=/opt/cuda/bin

  #$ _THERE_=/opt/cuda/bin

  #$ _TARGET_SIZE_=

  #$ _TARGET_DIR_=

  #$ _TARGET_DIR_=targets/x86_64-linux

  #$ TOP=/opt/cuda/bin/..

  #$ NVVMIR_LIBRARY_DIR=/opt/cuda/bin/../nvvm/libdevice

  #$ LD_LIBRARY_PATH=/opt/cuda/bin/../lib:

  #$
  PATH=/opt/cuda/bin/../nvvm/bin:/opt/cuda/bin:/usr/local/bin:/usr/bin:/opt/3rd-party/bin:/opt/cuda/bin

  #$ INCLUDES="-I/opt/cuda/bin/../targets/x86_64-linux/include"

  #$ LIBRARIES= "-L/opt/cuda/bin/../targets/x86_64-linux/lib/stubs"
  "-L/opt/cuda/bin/../targets/x86_64-linux/lib"

  #$ CUDAFE_FLAGS=

  #$ PTXAS_FLAGS=

  #$ rm tmp/a_dlink.reg.c

  #$ gcc -D__CUDA_ARCH_LIST__=520 -E -x c++ -D__CUDACC__ -D__NVCC__
  "-I/opt/cuda/bin/../targets/x86_64-linux/include" -D__CUDACC_VER_MAJOR__=12
  -D__CUDACC_VER_MINOR__=2 -D__CUDACC_VER_BUILD__=128
  -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=2
  -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64
  "CMakeCUDACompilerId.cu" -o "tmp/CMakeCUDACompilerId.cpp4.ii"

  In file included from
  /opt/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:82,

                   from <command-line>:

  /opt/cuda/bin/../targets/x86_64-linux/include/crt/host_config.h:143:2:
  error: #error -- unsupported GNU version! gcc versions later than 12 are
  not supported! The nvcc flag '-allow-unsupported-compiler' can be used to
  override this version check; however, using an unsupported host compiler
  may cause compilation failure or incorrect run time execution.  Use at your
  own risk.

    143 | #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
        |  ^~~~~

  # --error 0x1 --

Call Stack (most recent call first):
  /usr/share/cmake-3.27/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
  /usr/share/cmake-3.27/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
  /usr/share/cmake-3.27/Modules/CMakeDetermineCUDACompiler.cmake:307 (CMAKE_DETERMINE_COMPILER_ID)
  CMakeLists.txt:45 (enable_language)

-- Configuring incomplete, errors occurred!
harold-b commented 10 months ago

That's not my CMake config, i's a standard CMake module. The error seems to be given in the output:

error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc

You need to use an earlier gcc version

hajes commented 10 months ago

I managed to downgrade to gcc-11. Still no luck.

cmake --build . --target bladebit --config Release
[  0%] Building C object _deps/sodium-build/CMakeFiles/sodium.dir/cmake_pch.h.gch
gcc: error: unrecognized command-line option ‘-mrelax-cmpxchg-loop’
gmake[3]: *** [_deps/sodium-build/CMakeFiles/sodium.dir/build.make:77: _deps/sodium-build/CMakeFiles/sodium.dir/cmake_pch.h.gch] Error 1
gmake[2]: *** [CMakeFiles/Makefile2:247: _deps/sodium-build/CMakeFiles/sodium.dir/all] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:174: CMakeFiles/bladebit.dir/rule] Error 2
gmake: *** [Makefile:182: bladebit] Error 2
harold-b commented 10 months ago

You may have to remove the cmake cache. Looks like you're trying to build with an old cache, for which libsodium has additional compiler directives not supported in that version of gcc. Remoe the build/ or build-release/ directory completely and then retry.

hajes commented 10 months ago

removed build, and it looks like Clear Linux doesn't like it when you screw around with different gcc :-D

cmake ..
-- The C compiler identification is GNU 11.4.1
-- The CXX compiler identification is GNU 13.2.1
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/gcc
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: /usr/bin/gcc
-- Check for working C compiler: /usr/bin/gcc - broken
CMake Error at /usr/share/cmake-3.27/Modules/CMakeTestCCompiler.cmake:67 (message):
  The C compiler

    "/usr/bin/gcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: '/home/hajes/bladebit/build/CMakeFiles/CMakeScratch/TryCompile-CXKApI'

    Run Build Command(s): /usr/bin/cmake -E env VERBOSE=1 /usr/bin/gmake -f Makefile cmTC_912d6/fast
    /usr/bin/gmake  -f CMakeFiles/cmTC_912d6.dir/build.make CMakeFiles/cmTC_912d6.dir/build
    gmake[1]: Entering directory '/home/hajes/bladebit/build/CMakeFiles/CMakeScratch/TryCompile-CXKApI'
    Building C object CMakeFiles/cmTC_912d6.dir/testCCompiler.c.o
    /usr/bin/gcc   -g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -Wformat -Wformat-security -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -fno-semantic-interposition -ffat-lto-objects -fno-trapping-math -Wl,-sort-common -Wl,--enable-new-dtags -mrelax-cmpxchg-loop  -o CMakeFiles/cmTC_912d6.dir/testCCompiler.c.o -c /home/hajes/bladebit/build/CMakeFiles/CMakeScratch/TryCompile-CXKApI/testCCompiler.c
    gcc: error: unrecognized command-line option ‘-mrelax-cmpxchg-loop’
    gmake[1]: *** [CMakeFiles/cmTC_912d6.dir/build.make:78: CMakeFiles/cmTC_912d6.dir/testCCompiler.c.o] Error 1
    gmake[1]: Leaving directory '/home/hajes/bladebit/build/CMakeFiles/CMakeScratch/TryCompile-CXKApI'
    gmake: *** [Makefile:127: cmTC_912d6/fast] Error 2

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:23 (project)
harold-b commented 10 months ago

Looks like it might have some confusion as to compiler versions still as it's listing both. One version for cpp, another for C:

-- The C compiler identification is GNU 11.4.1
-- The CXX compiler identification is GNU 13.2.1

Yet it's passing it the option -mrelax-cmpxchg-loop which is only for GCC 12+.

Maybe point to them explicitly when building:

CC="/path/to/gcc11"  CXX="/path/to/g++11" cmake ..
hajes commented 10 months ago

issue was CFLAGS & CXXFLAGS flags...they still had -mrelax-cmpxchg-loop in

bladebit compiled to version 0.0.0-dev

thanks for help, lets see if it works