Closed gaowayne closed 1 month ago
The issue is that you are on much newer compiler and os kernel than what bam supports.
We don't intend to update the codebase to support newer versions of Linux and compiler yet. We will plan it later in the future. We are welcoming PRa that would fix these issues.
The fix for freestanding requires changes to entire codebase and moving to cuda::atomics semantics. This would fix the compiler issue but not the os kernel issue. Kernel upgrade is a bit more tedious effort.
The issue is that you are on much newer compiler and os kernel than what bam supports.
We don't intend to update the codebase to support newer versions of Linux and compiler yet. We will plan it later in the future. We are welcoming PRa that would fix these issues.
The fix for freestanding requires changes to entire codebase and moving to cuda::atomics semantics. This would fix the compiler issue but not the os kernel issue. Kernel upgrade is a bit more tedious effort.
thank you so much. my plan is to run through it and understand code better and collect some benchmark with our SSD. could you please share me one workable configuration OS distribution name and version, I will install exactly same with you?
Many have successfully reproduced results following exact steps and version described in the readme. I encourage to try that.
Many have successfully reproduced results following exact steps and version described in the readme. I encourage to try that.
thank you so much man. now I tried ubuntu 20.04.3, GCC works fine now, it can build code well. but after I install ubuntu nvidia graphic driver 535 or 470(this project mentioned this), my server will lock up, cannot ssh connect on it until reboot. from dmesg log. we have below error log, could you please shed some light?
[ 422.450961] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 470.256.02 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.
[ 422.451145] nvidia: probe of 0000:8a:00.0 failed with error -1
[ 422.451178] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 422.451179] NVRM: None of the NVIDIA devices were initialized.
[ 422.452194] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[ 422.736006] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[ 422.737854] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 470.256.02 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.
[ 422.738041] nvidia: probe of 0000:8a:00.0 failed with error -1
[ 422.738072] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 422.738072] NVRM: None of the NVIDIA devices were initialized.
[ 422.739680] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[ 423.015612] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[ 423.017438] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 470.256.02 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.
[ 423.017622] nvidia: probe of 0000:8a:00.0 failed with error -1
[ 423.017650] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 423.017651] NVRM: None of the NVIDIA devices were initialized.
[ 423.018898] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[ 423.285329] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[ 423.287222] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 470.256.02 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.
[ 423.287364] nvidia: probe of 0000:8a:00.0 failed with error -1
[ 423.287397] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 423.287397] NVRM: None of the NVIDIA devices were initialized.
[ 423.288350] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[ 423.579743] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[ 423.581612] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 470.256.02 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.
[ 423.581826] nvidia: probe of 0000:8a:00.0 failed with error -1
[ 423.581865] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 423.581865] NVRM: None of the NVIDIA devices were initialized.
[ 423.583506] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[ 423.899369] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[ 423.901136] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 470.256.02 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.
[ 423.901303] nvidia: probe of 0000:8a:00.0 failed with error -1
[ 423.901336] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 423.901337] NVRM: None of the NVIDIA devices were initialized.
[ 423.902882] nvidia-nvlink: Unregistered the Nvlink Core, major device number 508
[ 424.183979] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[ 424.185757] NVRM: The NVIDIA GPU 0000:8a:00.0 (PCI ID: 10de:26b9)
NVRM: installed in this system is not supported by the
NVRM: NVIDIA 470.256.02 driver release.
NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
NVRM: in this release's README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.
[ 424.185903] nvidia: probe of 0000:8a:00.0 failed with error -1
[ 424.185937] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 424.185937] NVRM: None of the NVIDIA devices were initialized.
Closing this issue as we started discussing in another thread.
Describe the bug A clear and concise description of what the bug is. it is build error To Reproduce Steps to reproduce the behavior: following the guide to make libnvm on ubuntu24.04 Expected behavior A clear and concise description of what you expected to happen. should build well Screenshots If applicable, add screenshots to help explain your problem.
Machine Setup (please complete the following information):
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA L40S Off | 00000000:8A:00.0 Off | 0 | | N/A 36C P8 34W / 350W | 1MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
Additional context Add any other context about the problem here. Add as many description as possible to help you out faster. This is a system's setup, knowing about the system is critical to understand the problem. my GCC version is 13.2
cmake log below