Closed riptl closed 3 months ago
Please note that the Makefiles for ppc64le are broken in various ways, so building nvidia-drm.ko required some changes. I can submit patches to fix them if there is community/maintainer interest.
Debug log:
[411161.437562] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[411161.438400] NVRM: GPU 0000:01:00.0: Opening GPU with minor number 0
[411161.443030] NVRM: GPU 0000:01:00.0: RmInitAdapter
[411161.443035] NVRM: GPU 0000:01:00.0: RmSetupRegisters for 0x10de:0x1f08
[411161.443040] NVRM: GPU 0000:01:00.0: pci config info:
[411161.443043] NVRM: GPU 0000:01:00.0: registers look like: 0x600c000000000 0x1000000NVRM: GPU 0000:01:00.0: fb looks like: 0x6000000000000 0x10000000NVRM: GPU 0000:01:00.0: Successfully mapped framebuffer and registers
[411161.443065] NVRM: GPU 0000:01:00.0: final mappings:
[411161.443068] NVRM: GPU 0000:01:00.0: regs: 0x600c000000000 0x1000000 0x00000000d5906fea
[411161.565307] NVRM: VM: nv_alloc_pages: 1 pages, nodeid -1
[411161.565927] NVRM: VM: contig 1 cache_type 1
[411161.571021] NVRM: VM: nv_alloc_contig_pages: 1 pages
[411161.572415] NVRM: VM: nv_alloc_pages:3790: 0x0000000058493b7b, 1 page(s), count = 1, page_table = 0x00000000e1bcb11b
[411161.572442] NVRM: memdescMapIommu: 0x800000d87910000-0x800000d87910fff is not addressable by GPU 0x100 [0x0-0x7fffffffffff]
[411161.572449] NVRM: VM: nv_free_pages: 0x1
[411161.572451] NVRM: VM: nv_free_pages:3813: 0x0000000058493b7b, 1 page(s), count = 1, page_table = 0x00000000e1bcb11b
[411161.572456] NVRM: VM: nv_free_contig_pages: 1 pages
[411161.572464] NVRM: nvCheckOkFailedNoLog: Check failed: Address not valid [NV_ERR_INVALID_ADDRESS] (0x0000001E) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1348
[411161.572472] NVRM: VM: nv_alloc_pages: 1 pages, nodeid -1
[411161.572474] NVRM: VM: contig 1 cache_type 1
[411161.572501] NVRM: VM: nv_alloc_contig_pages: 1 pages
[411161.572508] NVRM: VM: nv_alloc_pages:3790: 0x000000002495e64f, 1 page(s), count = 1, page_table = 0x00000000b61a286d
[411161.572518] NVRM: memdescMapIommu: 0x800000d0c430000-0x800000d0c430fff is not addressable by GPU 0x100 [0x0-0x7fffffffffff]
[411161.572521] NVRM: VM: nv_free_pages: 0x1
[411161.572523] NVRM: VM: nv_free_pages:3813: 0x000000002495e64f, 1 page(s), count = 1, page_table = 0x00000000b61a286d
[411161.572527] NVRM: VM: nv_free_contig_pages: 1 pages
[411161.572532] NVRM: nvCheckOkFailedNoLog: Check failed: Address not valid [NV_ERR_INVALID_ADDRESS] (0x0000001E) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1348
[411161.572537] NVRM: nvAssertFailedNoLog: Assertion failed: pKernelMemorySystem->sysmemFlushBuffer != 0 @ kern_mem_sys_gm107.c:382
[411161.572695] NVRM: VM: nv_alloc_pages: 9 pages, nodeid -1
[411161.572698] NVRM: VM: contig 0 cache_type 0
[411161.572701] NVRM: VM: nv_alloc_system_pages: 9 order0 pages, 0 order
[411161.572744] NVRM: VM: nv_alloc_pages:3790: 0x000000002495e64f, 9 page(s), count = 1, page_table = 0x00000000578cc9ba
[411161.572763] NVRM: memdescMapIommu: 0x800000d0c430000 is not addressable by GPU 0x100 [0x0-0x7fffffffffff]
[411161.572769] NVRM: VM: nv_free_pages: 0x9
[411161.572771] NVRM: VM: nv_free_pages:3813: 0x000000002495e64f, 9 page(s), count = 1, page_table = 0x00000000578cc9ba
[411161.572775] NVRM: VM: nv_free_system_pages: 9 pages
[411161.572781] NVRM: nvCheckOkFailedNoLog: Check failed: Address not valid [NV_ERR_INVALID_ADDRESS] (0x0000001E) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1348
[411161.572785] NVRM: nvAssertOkFailedNoLog: Assertion failed: Address not valid [NV_ERR_INVALID_ADDRESS] (0x0000001E) returned from nvStatus @ message_queue_cpu.c:241
[411161.572792] NVRM: _kgspInitRpcInfrastructure: GspMsgQueueInit failed
[411161.572795] NVRM: kgspConstructEngine_IMPL: init RPC infrastructure failed
[411161.572928] NVRM: osInitNvMapping: *** Cannot attach gpu
[411161.572934] NVRM: RmInitAdapter: osInitNvMapping failed, bailing out of RmInitAdapter
[411161.572938] NVRM: GPU 0000:01:00.0: Tearing down registers
[411161.572946] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x1e:744)
[411161.573203] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[411161.574653] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[411161.575401] [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
Thank you for your report. Unfortunately, we don't currently plan to support ppc64le with the open-gpu-kernel-modules. Closing; sorry.
NVIDIA Open GPU Kernel Modules Version
448d5cc65624d3aa69015efa0d3fb50fd9729f41
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
I confirm that this does not happen with the proprietary driver package.The proprietary driver does not support ppc64leOperating System and Version
Fedora Linux 40 (Server Edition)
Kernel Release
Linux p9l1 6.9.8-200.fc40.ppc64le #1 SMP Fri Jul 5 15:53:24 UTC 2024 ppc64le GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
RTX 2060 (can't run nvidia-smi)
Describe the bug
To Reproduce
Build on ppc64le with 64k pages and load nvidia-drm.ko
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
This is on ppc64le with 64KiB pages