ZaidQureshi / bam

BSD 2-Clause "Simplified" License
128 stars 32 forks source link

build error on ubuntu 24.04 after following the build guide #38

Closed gaowayne closed 1 week ago

gaowayne commented 1 week ago

Describe the bug A clear and concise description of what the bug is. it is build error To Reproduce Steps to reproduce the behavior: following the guide to make libnvm on ubuntu24.04 Expected behavior A clear and concise description of what you expected to happen. should build well Screenshots If applicable, add screenshots to help explain your problem.

[ 10%] Building CXX object CMakeFiles/libnvm.dir/src/admin.cpp.o
In file included from /root/wayne/bam/bam/include/freestanding/include/simt/type_traits:36,
                 from /root/wayne/bam/bam/include/freestanding/include/simt/atomic:57,
                 from /root/wayne/bam/bam/include/nvm_types.h:10,
                 from /root/wayne/bam/bam/src/admin.cpp:1:
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1461:8: error: expected identifier before ‘__is_convertible’
 1461 | struct __is_convertible
      |        ^~~~~~~~~~~~~~~~
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1461:8: error: expected unqualified-id before ‘__is_convertible’
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1467:40: error: expected identifier before ‘__is_convertible’
 1467 | template <class _T1, class _T2> struct __is_convertible<_T1, _T2, 0, 1> : public false_type {};
      |                                        ^~~~~~~~~~~~~~~~
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1467:40: error: expected unqualified-id before ‘__is_convertible’
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1468:40: error: expected identifier before ‘__is_convertible’
 1468 | template <class _T1, class _T2> struct __is_convertible<_T1, _T2, 1, 1> : public false_type {};
      |                                        ^~~~~~~~~~~~~~~~
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1468:40: error: expected unqualified-id before ‘__is_convertible’
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1469:40: error: expected identifier before ‘__is_convertible’
 1469 | template <class _T1, class _T2> struct __is_convertible<_T1, _T2, 2, 1> : public false_type {};
      |                                        ^~~~~~~~~~~~~~~~
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1469:40: error: expected unqualified-id before ‘__is_convertible’
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1470:40: error: expected identifier before ‘__is_convertible’
 1470 | template <class _T1, class _T2> struct __is_convertible<_T1, _T2, 3, 1> : public false_type {};
      |                                        ^~~~~~~~~~~~~~~~
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1470:40: error: expected unqualified-id before ‘__is_convertible’
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1472:40: error: expected identifier before ‘__is_convertible’
 1472 | template <class _T1, class _T2> struct __is_convertible<_T1, _T2, 0, 2> : public false_type {};
      |                                        ^~~~~~~~~~~~~~~~
/root/wayne/bam/bam/include/freestanding/include/simt/../../libcxx/include/type_traits:1472:40: error: expected unqualified-id before ‘__is_convertible’

Machine Setup (please complete the following information):

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+

Additional context Add any other context about the problem here. Add as many description as possible to help you out faster. This is a system's setup, knowing about the system is critical to understand the problem. my GCC version is 13.2

Configured with: ../src/configure -v --with-pkgversion='Ubuntu 13.2.0-23ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-libstdcxx-backtrace --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4) 

cmake log below

root@salab-hpedl380g11-01:~/wayne/bam/bam/build# cmake .. -Wno-dev
-- The CUDA compiler identification is NVIDIA 12.0.140
-- The C compiler identification is GNU 13.2.0
-- The CXX compiler identification is GNU 13.2.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found CUDA: /usr (found suitable version "12.0", minimum required is "8.0") 
-- Using NVIDIA driver found in /usr/src/nvidia-535.183.01
-- Not building FIO
-- Configuring libnvm without SmartIO
-- Configuring kernel module without CUDA
-- Found CUDA: /usr (found suitable version "12.0", minimum required is "10.0") 
-- Found CUDA: /usr (found suitable version "12.0", minimum required is "8.0") 
-- Configuring done (1.7s)
-- Generating done (0.0s)
-- Build files have been written to: /root/wayne/bam/bam/build
msharmavikram commented 1 week ago

The issue is that you are on much newer compiler and os kernel than what bam supports.

We don't intend to update the codebase to support newer versions of Linux and compiler yet. We will plan it later in the future. We are welcoming PRa that would fix these issues.

The fix for freestanding requires changes to entire codebase and moving to cuda::atomics semantics. This would fix the compiler issue but not the os kernel issue. Kernel upgrade is a bit more tedious effort.

gaowayne commented 1 week ago

The issue is that you are on much newer compiler and os kernel than what bam supports.

We don't intend to update the codebase to support newer versions of Linux and compiler yet. We will plan it later in the future. We are welcoming PRa that would fix these issues.

The fix for freestanding requires changes to entire codebase and moving to cuda::atomics semantics. This would fix the compiler issue but not the os kernel issue. Kernel upgrade is a bit more tedious effort.

thank you so much. my plan is to run through it and understand code better and collect some benchmark with our SSD. could you please share me one workable configuration OS distribution name and version, I will install exactly same with you?

msharmavikram commented 1 week ago

Many have successfully reproduced results following exact steps and version described in the readme. I encourage to try that.

gaowayne commented 1 week ago

Many have successfully reproduced results following exact steps and version described in the readme. I encourage to try that.

thank you so much man. now I tried ubuntu 20.04.3, GCC works fine now, it can build code well. but after I install ubuntu nvidia graphic driver 535 or 470(this project mentioned this), my server will lock up, cannot ssh connect on it until reboot. from dmesg log. we have below error log, could you please shed some light?

[  422.450961] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
               NVRM: installed in this system is not supported by the
               NVRM: NVIDIA 470.256.02 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: in this release's README, available on the operating system
               NVRM: specific graphics driver download page at www.nvidia.com.
[  422.451145] nvidia: probe of 0000:8a:00.0 failed with error -1
[  422.451178] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  422.451179] NVRM: None of the NVIDIA devices were initialized.
[  422.452194] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[  422.736006] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[  422.737854] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
               NVRM: installed in this system is not supported by the
               NVRM: NVIDIA 470.256.02 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: in this release's README, available on the operating system
               NVRM: specific graphics driver download page at www.nvidia.com.
[  422.738041] nvidia: probe of 0000:8a:00.0 failed with error -1
[  422.738072] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  422.738072] NVRM: None of the NVIDIA devices were initialized.
[  422.739680] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[  423.015612] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[  423.017438] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
               NVRM: installed in this system is not supported by the
               NVRM: NVIDIA 470.256.02 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: in this release's README, available on the operating system
               NVRM: specific graphics driver download page at www.nvidia.com.
[  423.017622] nvidia: probe of 0000:8a:00.0 failed with error -1
[  423.017650] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  423.017651] NVRM: None of the NVIDIA devices were initialized.
[  423.018898] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[  423.285329] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[  423.287222] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
               NVRM: installed in this system is not supported by the
               NVRM: NVIDIA 470.256.02 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: in this release's README, available on the operating system
               NVRM: specific graphics driver download page at www.nvidia.com.
[  423.287364] nvidia: probe of 0000:8a:00.0 failed with error -1
[  423.287397] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  423.287397] NVRM: None of the NVIDIA devices were initialized.
[  423.288350] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[  423.579743] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[  423.581612] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
               NVRM: installed in this system is not supported by the
               NVRM: NVIDIA 470.256.02 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: in this release's README, available on the operating system
               NVRM: specific graphics driver download page at www.nvidia.com.
[  423.581826] nvidia: probe of 0000:8a:00.0 failed with error -1
[  423.581865] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  423.581865] NVRM: None of the NVIDIA devices were initialized.
[  423.583506] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[  423.899369] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[  423.901136] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
               NVRM: installed in this system is not supported by the
               NVRM: NVIDIA 470.256.02 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: in this release's README, available on the operating system
               NVRM: specific graphics driver download page at www.nvidia.com.
[  423.901303] nvidia: probe of 0000:8a:00.0 failed with error -1
[  423.901336] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  423.901337] NVRM: None of the NVIDIA devices were initialized.
[  423.902882] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[  424.183979] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[  424.185757] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
               NVRM: installed in this system is not supported by the
               NVRM: NVIDIA 470.256.02 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: in this release's README, available on the operating system
               NVRM: specific graphics driver download page at www.nvidia.com.
[  424.185903] nvidia: probe of 0000:8a:00.0 failed with error -1
[  424.185937] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  424.185937] NVRM: None of the NVIDIA devices were initialized.
msharmavikram commented 1 week ago

Closing this issue as we started discussing in another thread.