Keylost / jetson-ffmpeg

ffmpeg support on nvidia jetson
Other
78 stars 26 forks source link

ffmpeg 6.0 on Jetson TX2 NX #4

Closed tmaoz closed 1 year ago

tmaoz commented 1 year ago

Hey,

I previously used jocover's original work with ffmpeg 4 on the TX2 NX and it worked just fine. However, I wish to move to ffmpeg 6.0 and trying your updated code but getting some errors (it DOES work like a charm on my Orin Dev Kit).

When I run ffmpeg using nvmpi as the input codec, ffmpeg simply gets stuck and there's no debug output to indicate why. I can't even kill ffmpeg (-9 or -15). I do see an endless stream of smmu power errors in dmesg and the only thing I can do is reboot the TX2 NX:

[25739.698249] __arm_smmu_context_fault: 193298 callbacks suppressed
[25739.698258] arm-smmu 12000000.iommu: Unhandled context fault: iova=0xb34ca600, fsynr=0x200001, cb=13, sid=6(0x6 - NVDEC), pgd=0, pud=0, pmd=0, pte=0

Any idea what is going on? Can your code actually work on the TX2 NX or does it use Orin features that are not available or have been changed from the TX2 NX?

Thanks!

Keylost commented 1 year ago

Hi! The library in this repository should be backward compatible with the old API. Unfortunately I don't have a TX2 NX module to try and reproduce the problem. Can you use a debugger and provide details of which particular call inside nvmpi is hanging, as well as the complete call stack? What ffmpeg command are you using?

tmaoz commented 1 year ago

Thanks, for the reply!

I compiled libnvmpi with debugging symbols and ran over gdb:

(gdb) file /home/root/ffmpeg_tx2nx/dist/bin/ffmpeg 
Reading symbols from /home/root/ffmpeg_tx2nx/dist/bin/ffmpeg...(no debugging symbols found)...done.
(gdb) set LD_LIBRARY_PATH=/home/root/ffmpeg_tx2nx/dist/lib:/home/root/ffmpeg_tx2nx/dist/lib/tegra:/home/root/ffmpeg_tx2nx/dist/lib/tegra-egl:/home/root/ffmpeg_tx2nx/dist/usr/lib:/home/root/ffmpeg_tx2nx/dist/usr/local/cuda-10.2/lib:/home/root/ffmpeg_tx2nx/dist/usr/local/cuda-10.2/lib/stubs:/home/root/ffmpeg_tx2nx/dist/usr/local/cuda-10.2/nvvm/lib64:/home/root/ffmpeg_tx2nx/dist/usr/local/lib
No symbol table is loaded.  Use the "file" command.
(gdb) run -c:v h264_nvmpi -i ../content/bbb_sunflower_1080p_30fps_normal.ts -c:v libx264 /tmp/bbb.mp4
Starting program: /home/root/ffmpeg_tx2nx/dist/bin/ffmpeg -c:v h264_nvmpi -i ../content/bbb_sunflower_1080p_30fps_normal.ts -c:v libx264 /tmp/bbb.mp4
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
nvbuf_utils: Could not get EGL display connection
nvbuf_utils: ERROR getting proc addr of eglCreateImageKHR
nvbuf_utils: ERROR getting proc addr of eglDestroyImageKHR
ffmpeg version 27205c0 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 7 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)
  configuration: --enable-nonfree --enable-nvmpi --enable-cuda-nvcc --enable-libnpp --enable-gpl --enable-libx264 --enable-libx265 --enable-librabbitmq --extra-cflags='-I/home/root/ffmpeg_tx2nx/dist/usr/local/cuda-10.2/include -I/home/root/ffmpeg_tx2nx/dist/usr/local/include' --extra-ldflags='-L/home/root/ffmpeg_tx2nx/dist/usr/local/cuda-10.2/lib64 -L/home/root/ffmpeg_tx2nx/dist/lib -L/home/root/ffmpeg_tx2nx/dist/lib/tegra -L/home/root/ffmpeg_tx2nx/dist/lib/tegra-egl -L/home/root/ffmpeg_tx2nx/dist/usr/lib -L/home/root/ffmpeg_tx2nx/dist/usr/local/lib' --disable-static --enable-shared --nvcc=/home/root/ffmpeg_tx2nx/dist/usr/local/cuda-10.2/bin/nvcc --nvccflags='-gencode arch=compute_52,code=sm_52 -O2' --prefix=/home/root/ffmpeg_tx2nx/dist --enable-encoder=png --enable-zlib --pkgconfigdir=/home/root/ffmpeg_tx2nx/dist/usr/local/lib/pkgconfig --enable-debug
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
Input #0, mpegts, from '../content/bbb_sunflower_1080p_30fps_normal.ts':
  Duration: 00:10:34.53, start: 1.466667, bitrate: 3126 kb/s
  Program 1 
    Metadata:
      service_name    : Big Buck Bunny, Sunflower version
      service_provider: FFmpeg
  Stream #0:0[0x100]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 90k tbn
File '/tmp/bbb.mp4' already exists. Overwrite? [y/N] y
Failed to query video capabilities: Inappropriate ioctl for device
libv4l2: error getting capabilities: Inappropriate ioctl for device

Program received signal SIGSEGV, Segmentation fault.
0x0000007fb63cd690 in NvV4l2Element::subscribeEvent (this=0x0, type=5, id=0, flags=0) at /home/root/ffmpeg_tx2nx/dist/usr/src/jetson_multimedia_api/samples/common/classes/NvV4l2Element.cpp:225
225         ret = v4l2_ioctl(fd, VIDIOC_SUBSCRIBE_EVENT, &sub);
(gdb) bt
#0  0x0000007fb63cd690 in NvV4l2Element::subscribeEvent (this=0x0, type=5, id=0, flags=0) at /home/root/ffmpeg_tx2nx/dist/usr/src/jetson_multimedia_api/samples/common/classes/NvV4l2Element.cpp:225
#1  0x0000007fb63c05ac in nvmpi_create_decoder (codingType=NV_VIDEO_CodingH264, pixFormat=NV_PIX_YUV420) at /home/root/ffmpeg_tx2nx/jetson-ffmpeg/src/nvmpi_dec.cpp:514
#2  0x0000007fb6eb0e48 in ?? () from /home/root/ffmpeg_tx2nx/dist/lib/libavcodec.so.60

Basically all I'm trying to do is decode the input file (Big Buck Bunny) and reencode with standard CPU libx264. For some reason, ffmpeg itself did not compile with debugging symbols even though I did add '--enable-debug' to the configure command line.

My TX2 NX box is a bit restricted because of some issues so I'm actually doing the entire build and testing in a docker container running Ubuntu 18.04 with the nvidia runtime enabled. However, I am able to run on the native host as well after building and I get the same output.

tmaoz commented 1 year ago

I actually recompiled the older ffmpeg4 and ran over strace. when ffmpeg gets stuck and dmesg shows the arm-smmu errors, the last thing strace shows is this:

98970 write(2, "Input #0, mpegts, from '/home/root/content/bbb_sunflower_1080p_30fps_normal.ts':\n", 81) = 81
98970 write(2, "  Duration: ", 12)      = 12
98970 write(2, "00:10:34.53", 11)       = 11
98970 write(2, ", start: ", 9)          = 9
98970 write(2, "1.466667", 8)           = 8
98970 write(2, ", bitrate: ", 11)       = 11
98970 write(2, "3126 kb/s", 9)          = 9
98970 write(2, "\n", 1)                 = 1
98970 write(2, "  Program 1 \n", 13)    = 13
98970 write(2, "    Metadata:\n", 14)   = 14
98970 write(2, "      service_name    : ", 24) = 24
98970 write(2, "Big Buck Bunny, Sunflower version", 33) = 33
98970 write(2, "\n", 1)                 = 1
98970 write(2, "      service_provider: ", 24) = 24
98970 write(2, "FFmpeg", 6)             = 6
98970 write(2, "\n", 1)                 = 1
98970 write(2, "    Stream #0:0", 15)   = 15
98970 write(2, "[0x100]", 7)            = 7
98970 write(2, ": Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9]", 97) = 97
98970 write(2, ", ", 2)                 = 2
98970 write(2, "30 fps, ", 8)           = 8
98970 write(2, "30 tbr, ", 8)           = 8
98970 write(2, "90k tbn, ", 9)          = 9
98970 write(2, "60 tbc", 6)             = 6
98970 write(2, "\n", 1)                 = 1
98970 openat(AT_FDCWD, "/tmp/bbb.mp4", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 15
98970 fcntl(15, F_SETFD, FD_CLOEXEC)    = 0
98970 fstat(15, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
98970 lseek(15, 0, SEEK_SET)            = 0
98970 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=60000}, ru_stime={tv_sec=0, tv_usec=60000}, ...}) = 0
98970 openat(AT_FDCWD, "/dev/nvhost-nvdec", O_RDWR <unfinished ...>
98987 <... futex resumed> )             = ? <unavailable>
98990 <... futex resumed>)              = ?
98989 <... futex resumed>)              = ?
98988 <... futex resumed>)              = ?
98990 +++ killed by SIGKILL +++
98989 +++ killed by SIGKILL +++
98988 +++ killed by SIGKILL +++
98987 +++ killed by SIGKILL +++
98970 <... openat resumed>)             = ?
98970 +++ killed by SIGKILL +++

When it gets stuck I kill it but it takes a while for the signal to get handled...

So looks like errors when trying to open the /dev/nvhost-nvdec device and this is when I'm running natively on the host.

Any ideas?

Thanks!

Keylost commented 1 year ago

What version of L4T are you using? Is it the latest version available?

tmaoz commented 1 year ago

Hey, so because of some constraints, I'm not using the nvidia Ubuntu but rather a community maintained OS for the TX2 NX boxes. After digging deeper it turns out they are basing on Ubuntu 22.04 with components from JetPack 4.6.1 while the nvidia Ubuntu for the TX2 NX is based on 18.04. It turns out to be a GBLIC version incompatibility that causes other issues with various lib incompatibilities. After much digging I was able to get something running in a container that was based on the nvidia nvcr.io/nvidia/l4t-ml:r32.7.1-py3 image with manually installed newer versions of libc6, libc-bin, libcrypt, libstdc++ and locales-all...

The manually installed libs break the image's APT system and the entire dependency tree but ffmpeg finally works 🤷‍♂️

In any case, thanks for following up!