Open hacc1225 opened 1 year ago
I tested another card on this machine and it works with proprietary driver.
530.41.03
No
Gentoo Linux
Linux Hacc-AIO 6.1.28-gentoo-arm64 #1 SMP Sat Jun 3 20:33:06 CEST 2023 aarch64 GNU/Linux
NVIDIA T600
I can't using NVENC and NVDEC on ARM64 platform with this open source kernel module, but it works with proprietary driver.
Just run ffmpeg -i test.mkv -c:v h264_nvenc output.mp4
and you will get
[h264_nvenc @ 0xaaaaed50a530] InitializeEncoder failed: generic error (20): EncodeAPI Internal Error.
Always
This time i tested with 64k pagesize, this it the nvidia-bug-report.log.gz file when it works with proprietary driver. I also tested it on my other ARM64 machine, the Kunpeng 920 workstation, everything is fine including open source modules. 😂
This time my kernel is compiled using the following configuration file: kernel-config-6.1.28-gentoo-arm64.gz
I have filed a bug 4161073 for tracking purpose, shall look for ARM64 based system and try to reproduce issue locally.
Hi @hacc1225 I checked internally with latest released driver that I can not reproduce issue locally as shown below, request you to please test once from your end and share test results. [root@fedora ~]# ffmpeg -i sample_960x400_ocean_with_audio.mkv -c:v h264_nvenc output.mp4 ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers built with gcc 13 (GCC) configuration: --prefix=/usr --bindir=/usr/bin --datadir=/usr/share/ffmpeg --docdir=/usr/share/doc/ffmpeg --incdir=/usr/include/ffmpeg --libdir=/usr/lib64 --mandir=/usr/share/man --arch=aarch64 --optflags='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer' --extra-ldflags='-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 ' --extra-cflags=' -I/usr/include/rav1e' --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-version3 --enable-bzlib --enable-chromaprint --disable-crystalhd --enable-fontconfig --enable-frei0r --enable-gcrypt --enable-gnutls --enable-ladspa --enable-libaom --enable-libdav1d --enable-libass --enable-libbluray --enable-libbs2b --enable-libcodec2 --enable-libcdio --enable-libdrm --enable-libjack --enable-libjxl --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libmysofa --enable-nvenc --enable-openal --enable-opencl --enable-opengl --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-libplacebo --enable-librsvg --enable-librav1e --enable-librubberband --enable-libsmbclient --enable-version3 --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libv4l2 --enable-libvidstab --enable-libvpx --enable-vulkan --enable-libshaderc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-avfilter --enable-libmodplug --enable-postproc --enable-pthreads --disable-static --enable-shared --enable-gpl --disable-debug --disable-stripping --shlibdir=/usr/lib64 --enable-lto libavutil 58. 29.100 / 58. 29.100 libavcodec 60. 31.102 / 60. 31.102 libavformat 60. 16.100 / 60. 16.100 libavdevice 60. 3.100 / 60. 3.100 libavfilter 9. 12.100 / 9. 12.100 libswscale 7. 5.100 / 7. 5.100 libswresample 4. 12.100 / 4. 12.100 libpostproc 57. 3.100 / 57. 3.100 Input #0, matroska,webm, from 'sample_960x400_ocean_with_audio.mkv': Metadata: COMPATIBLE_BRANDS: isomavc1 MAJOR_BRAND : isom MINOR_VERSION : 1 ENCODER : Lavf58.45.100 Duration: 00:00:46.62, start: 0.000000, bitrate: 2976 kb/s Stream #0:0: Video: h264 (High), yuv420p(progressive), 960x400 [SAR 1:1 DAR 12:5], 23.98 fps, 23.98 tbr, 1k tbn (default) Metadata: HANDLER_NAME : GPAC ISO Video Handler ENCODER : Lavc58.91.100 libx264 DURATION : 00:00:46.550000000 Stream #0:1: Audio: vorbis, 48000 Hz, stereo, fltp (default) Metadata: HANDLER_NAME : GPAC ISO Audio Handler ENCODER : Lavc58.91.100 libvorbis DURATION : 00:00:46.616000000 Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_nvenc)) Stream #0:1 -> #0:1 (vorbis (native) -> aac (native)) Press [q] to stop, [?] for help Output #0, mp4, to 'output.mp4': Metadata: COMPATIBLE_BRANDS: isomavc1 MAJOR_BRAND : isom MINOR_VERSION : 1 encoder : Lavf60.16.100 Stream #0:0: Video: h264 (Main) (avc1 / 0x31637661), yuv420p(progressive), 960x400 [SAR 1:1 DAR 12:5], q=2-31, 2000 kb/s, 23.98 fps, 24k tbn (default) Metadata: HANDLER_NAME : GPAC ISO Video Handler DURATION : 00:00:46.550000000 encoder : Lavc60.31.102 h264_nvenc Side data: cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000 vbv_delay: N/A Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default) Metadata: HANDLER_NAME : GPAC ISO Audio Handler DURATION : 00:00:46.616000000 encoder : Lavc60.31.102 aac [out#0/mp4 @ 0xaaaaed97f410] video:11503kB audio:702kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.269885% frame= 1116 fps=180 q=22.0 Lsize= 12238kB time=00:00:46.61 bitrate=2150.5kbits/s speed=7.53x [aac @ 0xaaaaed9e3cb0] Qavg: 3576.604
Hi @amrit1711 Thank you for you reply. I just tested 550.78 driver on my Ampere eMAG 8180 machine with T600 GPU with gentoo kernel 6.9.0. Unfortunately it doesn't work anymore with either proprietary or open source drivers. I just getting following errors in dmesg and nvidia-smi can't even detected the card. Currently only version 535 of the proprietary driver works on my machine.
nvidia-open dmesg:
[ 7.660469] ACPI: bus type drm_connector registered
[ 7.686977] ast 0007:02:00.0: [drm] Using analog VGA
[ 7.691945] ast 0007:02:00.0: [drm] dram MCLK=800 Mhz type=7 bus_width=16
[ 7.698964] [drm] Initialized ast 0.1.0 20120228 for 0007:02:00.0 on minor 0
[ 7.837016] ast 0007:02:00.0: [drm] fb0: astdrmfb frame buffer device
[ 46.480891] systemd[1]: Starting Load Kernel Module drm...
[ 46.709608] nvidia: loading out-of-tree module taints kernel.
[ 46.716312] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 46.783886] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[ 46.801304] nvidia 0004:01:00.0: Adding to iommu group 14
[ 46.825187] nvidia 0004:01:00.0: enabling device (0000 -> 0003)
[ 46.825215] nvidia 0004:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 46.869956] systemd[1]: modprobe@drm.service: Deactivated successfully.
[ 46.877812] systemd[1]: Finished Load Kernel Module drm.
[ 46.886820] NVRM: loading NVIDIA UNIX Open Kernel Module for aarch64 550.78 Release Build (portage@localhost) Mon May 13 22:21:32 CEST 2024
[ 46.953482] nvidia-uvm: Loaded the UVM driver, major device number 510.
[ 48.722406] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for aarch64 550.78 Release Build (portage@localhost) Mon May 13 22:21:15 CEST 2024
[ 49.387186] [drm] [nvidia-drm] [GPU ID 0x00040100] Loading driver
[ 49.404727] Loading firmware: nvidia/550.78/gsp_tu10x.bin
[ 50.983631] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from sysmemData == vidmemData @ mem_mgr.c:344
[ 51.001320] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from status @ mem_mgr.c:3767
[ 51.022342] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from memmgrInitCeUtils(pMemoryManager, NV_FALSE) @ mem_mgr.c:396
[ 51.052922] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_fifo.c:3068
[ 51.066872] NVRM: RmInitNvDevice: *** Cannot load state into the device
[ 51.080080] NVRM: RmInitAdapter: RmInitNvDevice failed, bailing out of RmInitAdapter
[ 51.240138] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:345
[ 51.525673] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:345
[ 51.526085] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x40100
[ 51.526154] NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x40:1054)
[ 51.526730] NVRM: GPU 0004:01:00.0: rm_init_adapter failed, device minor number 0
[ 51.527401] [drm:nv_drm_exit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00040100] Failed to allocate NvKmsKapiDevice
[ 51.528499] [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00040100] Failed to register device
[ 52.315062] Loading firmware: nvidia/550.78/gsp_tu10x.bin
[ 54.020228] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from sysmemData == vidmemData @ mem_mgr.c:344
[ 54.037594] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from status @ mem_mgr.c:3767
[ 54.061470] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from memmgrInitCeUtils(pMemoryManager, NV_FALSE) @ mem_mgr.c:396
[ 54.081840] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_fifo.c:3068
[ 54.089778] NVRM: RmInitNvDevice: *** Cannot load state into the device
[ 54.097026] NVRM: RmInitAdapter: RmInitNvDevice failed, bailing out of RmInitAdapter
[ 54.149372] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:345
[ 54.442216] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:345
[ 54.456068] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x40100
[ 54.456084] NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x40:1054)
[ 54.456286] NVRM: GPU 0004:01:00.0: rm_init_adapter failed, device minor number 0
[ 54.457813] Loading firmware: nvidia/550.78/gsp_tu10x.bin
[ 55.857543] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from sysmemData == vidmemData @ mem_mgr.c:344
[ 55.875288] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from status @ mem_mgr.c:3767
[ 55.910808] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from memmgrInitCeUtils(pMemoryManager, NV_FALSE) @ mem_mgr.c:396
[ 55.932212] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_fifo.c:3068
[ 55.940410] NVRM: RmInitNvDevice: *** Cannot load state into the device
[ 55.947939] NVRM: RmInitAdapter: RmInitNvDevice failed, bailing out of RmInitAdapter
[ 56.000320] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:345
[ 56.287156] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:345
[ 56.287258] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x40100
[ 56.322941] NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x40:1054)
[ 56.323255] NVRM: GPU 0004:01:00.0: rm_init_adapter failed, device minor number 0
[ 145.074218] Loading firmware: nvidia/550.78/gsp_tu10x.bin
[ 146.801346] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from sysmemData == vidmemData @ mem_mgr.c:344
[ 146.817901] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from status @ mem_mgr.c:3767
[ 146.836213] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from memmgrInitCeUtils(pMemoryManager, NV_FALSE) @ mem_mgr.c:396
[ 146.853946] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_fifo.c:3068
[ 146.861528] NVRM: RmInitNvDevice: *** Cannot load state into the device
[ 146.868302] NVRM: RmInitAdapter: RmInitNvDevice failed, bailing out of RmInitAdapter
[ 146.918417] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:345
[ 147.208821] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:345
[ 147.222512] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x40100
[ 147.230033] NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x40:1054)
[ 147.238649] NVRM: GPU 0004:01:00.0: rm_init_adapter failed, device minor number 0
nvidia-proprietary dmesg
[ 7.723576] ACPI: bus type drm_connector registered
[ 7.749981] ast 0007:02:00.0: [drm] Using analog VGA
[ 7.754949] ast 0007:02:00.0: [drm] dram MCLK=800 Mhz type=7 bus_width=16
[ 7.762006] [drm] Initialized ast 0.1.0 20120228 for 0007:02:00.0 on minor 0
[ 7.903993] ast 0007:02:00.0: [drm] fb0: astdrmfb frame buffer device
[ 46.066422] systemd[1]: Starting Load Kernel Module drm...
[ 46.323983] nvidia: loading out-of-tree module taints kernel.
[ 46.344786] nvidia: module license 'NVIDIA' taints kernel.
[ 46.344794] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 46.344796] nvidia: module license taints kernel.
[ 46.421039] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[ 46.442640] nvidia 0004:01:00.0: Adding to iommu group 14
[ 46.485784] nvidia 0004:01:00.0: enabling device (0000 -> 0003)
[ 46.492576] nvidia 0004:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 46.538582] NVRM: loading NVIDIA UNIX aarch64 Kernel Module 550.78 Sun Apr 14 07:05:12 UTC 2024
[ 46.560975] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[ 46.625518] nvidia-uvm: Loaded the UVM driver, major device number 510.
[ 49.750550] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 550.78 Sun Apr 14 06:15:07 UTC 2024
[ 49.781441] [drm] [nvidia-drm] [GPU ID 0x00040100] Loading driver
[ 54.182512] NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x65:1589)
[ 54.190457] NVRM: GPU 0004:01:00.0: rm_init_adapter failed, device minor number 0
[ 54.201497] [drm:nv_drm_exit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00040100] Failed to allocate NvKmsKapiDevice
[ 54.216210] [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00040100] Failed to register device
[ 59.162406] NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x65:1589)
[ 59.170019] NVRM: GPU 0004:01:00.0: rm_init_adapter failed, device minor number 0
[ 63.362522] NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x65:1589)
[ 63.370520] NVRM: GPU 0004:01:00.0: rm_init_adapter failed, device minor number 0
[ 173.832844] NVRM: GPU 0004:01:00.0: RmInitAdapter failed! (0x25:0x65:1589)
[ 173.840045] NVRM: GPU 0004:01:00.0: rm_init_adapter failed, device minor number 0
HI @hacc1225 This seems to be different issue altogether. Could you please help to fetch nvidia bug report from repro state.
Hello @amrit1711 This is the output of nvidia-bug-report.sh from 550.78 nvidia-open.log.gz nvidia-proprietary.log.gz My machine is 64k pagesize (tried on 4k pagesize and didn't work either) and this is the kernel compile configuration: kernel-config-6.9.0-gentoo-arm64.gz I'm installing the kernel source with following Gentoo USE flags
[ebuild R ~] sys-kernel/gentoo-sources-6.9.0:6.9.0::gentoo USE="experimental -build -symlink" 0 KiB
I usually use the following cflag to build the driver (Although the one below is 535.179)
hacc@Hacc-AIO ~ $ cat /var/db/pkg/x11-drivers/nvidia-drivers-535.179/CFLAGS
-march=armv8-a+crypto+crc -mcpu=emag -O3 -pipe -fipa-pta -fgraphite-identity -floop-nest-optimize -fuse-linker-plugin -flto=32
hacc@Hacc-AIO ~ $ cat /var/db/pkg/x11-drivers/nvidia-drivers-535.179/CXXFLAGS
-march=armv8-a+crypto+crc -mcpu=emag -O3 -pipe -fipa-pta -fgraphite-identity -floop-nest-optimize -fuse-linker-plugin -flto=32
And I'm patching the kernel myself to modify the build cflags
diff -ur linux-6.9.0-gentoo.bak/arch/arm64/Makefile linux-6.9.0-gentoo-mod/arch/arm64/Makefile
--- linux-6.9.0-gentoo.bak/arch/arm64/Makefile 2024-05-14 16:31:28.424412235 +0200
+++ linux-6.9.0-gentoo-mod/arch/arm64/Makefile 2024-05-14 16:36:24.318255664 +0200
@@ -40,6 +40,7 @@
$(compat_vdso) $(cc_has_k_constraint)
KBUILD_CFLAGS += $(call cc-disable-warning, psabi)
KBUILD_AFLAGS += $(compat_vdso)
+KBUILD_CFLAGS += -mcpu=emag
KBUILD_RUSTFLAGS += --target=aarch64-unknown-none -Ctarget-feature="-neon"
diff -ur linux-6.9.0-gentoo.bak/Makefile linux-6.9.0-gentoo-mod/Makefile
--- linux-6.9.0-gentoo.bak/Makefile 2024-05-14 16:31:57.225029199 +0200
+++ linux-6.9.0-gentoo-mod/Makefile 2024-05-14 16:36:08.918130680 +0200
@@ -808,8 +808,8 @@
KBUILD_CFLAGS += -fno-delete-null-pointer-checks
ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
-KBUILD_CFLAGS += -O2
-KBUILD_RUSTFLAGS += -Copt-level=2
+KBUILD_CFLAGS += -O3
+KBUILD_RUSTFLAGS += -Copt-level=3
else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += -Os
KBUILD_RUSTFLAGS += -Copt-level=s
NVIDIA Open GPU Kernel Modules Version
530.41.03
Does this happen with the proprietary driver (of the same version) as well?
I cannot test this
Operating System and Version
Gentoo Linux
Kernel Release
Linux Hacc-AIO 6.1.28-gentoo-arm64 #1 SMP Sat Jun 3 20:33:06 CEST 2023 aarch64 GNU/Linux
Hardware: GPU
NVIDIA GeForce GTX 1660 SUPER
Describe the bug
I can't using NVENC and NVDEC on ARM64 platform, even with the the sample code in Nvidia Video Codec SDK. Btw CUDA works greate with open source kernel modules.
To Reproduce
Install Nvidia driver and CUDA Toolkit, compile sample code in Video Codec SDK, and run
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
I'm using a ARM motherboard with Ampere eMAG 8180 SoC, exactly the same board as in this article. This card does not work on this machine with the proprietary driver. I will get
My kernel is compiled using the following configuration file: kernel-config-6.3.3-gentoo-arm64.gz