Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
[ ] I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Ubuntu 20.04.6 LTS
Kernel Release
5.10.14
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
[X] I am running on a stable kernel release.
Hardware: GPU
NVIDIA GeForce RTX 3080
Describe the bug
We have deployed a ubuntu machine with an Open GPU Kernel Modules 520 nvidia driver. But the machine often has some exceptions. The error is as follows:
NVIDIA Open GPU Kernel Modules Version
520.56.06
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Ubuntu 20.04.6 LTS
Kernel Release
5.10.14
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA GeForce RTX 3080
Describe the bug
We have deployed a ubuntu machine with an Open GPU Kernel Modules 520 nvidia driver. But the machine often has some exceptions. The error is as follows:
NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa 2024-07-02 18:46:08.681559 kernel:[ 21.766727] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff 2024-07-02 18:46:08.681562 kernel:[ 21.766734] NVRM nvAssertFailedNoLog: Assertion failed: rmStatus == NV_OK @ osinit.c:1982
ecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164 [ 1731.589314] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235 [ 1731.589317] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff [ 1731.589323] NVRM RmInitAdapter: Cannot initialize GSP firmware RM [ 1731.591779] NVRM: GPU 0000:86:00.0: RmInitAdapter failed! (0x63:0xffff:1684) [ 1731.593977] NVRM: GPU 0000:86:00.0: rm_init_adapter failed, device minor number 0 [ 1731.777872] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa [ 1731.777876] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff [ 1731.800951] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe [ 1731.800957] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164 [ 1731.800963] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235 [ 1731.800965] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff [ 1731.800970] NVRM RmInitAdapter: Cannot initialize GSP firmware RM [ 1731.803388] NVRM: GPU 0000:af:00.0: RmInitAdapter failed! (0x63:0xffff:1684) [ 1731.805517] NVRM: GPU 0000:af:00.0: rm_init_adapter failed, device minor number 1 [ 1731.989155] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa [ 1731.989160] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff [ 1732.012716] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe [ 1732.012722] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
To Reproduce
Using 520.56.06 open-source nvidia driver and starting the machine
Bug Incidence
Sometimes
nvidia-bug-report.log.gz
NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa 2024-07-02 18:46:08.681559 kernel:[ 21.766727] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff 2024-07-02 18:46:08.681562 kernel:[ 21.766734] NVRM nvAssertFailedNoLog: Assertion failed: rmStatus == NV_OK @ osinit.c:1982
ecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164 [ 1731.589314] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235 [ 1731.589317] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff [ 1731.589323] NVRM RmInitAdapter: Cannot initialize GSP firmware RM [ 1731.591779] NVRM: GPU 0000:86:00.0: RmInitAdapter failed! (0x63:0xffff:1684) [ 1731.593977] NVRM: GPU 0000:86:00.0: rm_init_adapter failed, device minor number 0 [ 1731.777872] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa [ 1731.777876] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff [ 1731.800951] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe [ 1731.800957] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164 [ 1731.800963] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235 [ 1731.800965] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff [ 1731.800970] NVRM RmInitAdapter: Cannot initialize GSP firmware RM [ 1731.803388] NVRM: GPU 0000:af:00.0: RmInitAdapter failed! (0x63:0xffff:1684) [ 1731.805517] NVRM: GPU 0000:af:00.0: rm_init_adapter failed, device minor number 1 [ 1731.989155] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0xa [ 1731.989160] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff [ 1732.012716] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe [ 1732.012722] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
More Info
No response