elFarto / nvidia-vaapi-driver

A VA-API implemention using NVIDIA's NVDEC
Other
1.15k stars 53 forks source link

Fix memory issues #303

Open akorb opened 1 month ago

akorb commented 1 month ago

By manually analysing the code and using Address Sanitizer, I found three potential memory issues and fixed them. I also applied some minor refactoring on some occassions.

I found another memory leak, which I could not fix so far.

Direct leak of 288 byte(s) in 1 object(s) allocated from:
    #0 0x73ed926fb6e2 in realloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
    #1 0x73ed6392d3f0 in ensure_capacity ../src/list.c:24
    #2 0x73ed6392d629 in add_element ../src/list.c:31
    #3 0x73ed6390b3d8 in allocateObject ../src/vabackend.c:256
    #4 0x73ed639187ed in nvCreateBuffer ../src/vabackend.c:1175
    #5 0x73ed8e95f354 in vaCreateBuffer (/usr/lib/libva.so.2+0xb354) (BuildId: 906c589d5f481ebec3a8d425e30f5c117b668a02)
    #6 0x1000000000007  (<unknown module>)

Note that the line references are according to the code of my branch.

Since you have a much broader and deeper understanding of the code, maybe it's obvious for you to fix this leak?

akorb commented 1 month ago

I just added another commit fixing that a specific file descriptor is closed twice. I verified that the removed close never succeeded, i.e., never returned 0.

The rationale behind this removal is from the documentation of the cuImportExternalMemory function:

If CUDA_EXTERNAL_MEMORY_HANDLE_DESC::type is CU_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD, then CUDA_EXTERNAL_MEMORY_HANDLE_DESC::handle::fd must be a valid file descriptor referencing a memory object. Ownership of the file descriptor is transferred to the CUDA driver when the handle is imported successfully. Performing any operations on the file descriptor after it is imported results in undefined behavior.

In other words, CUDA already closes the file descriptor. Closing it again yields undefined behavior, even though in this case it just resulted in an untreated error return code. No harm done, but it's nicer to remove this always failing syscall.

akorb commented 6 days ago

For future reference on how I tested nvidia-vaapi-driver with valgrind and ASan.

Address Sanitizer

Add -Db_sanitize=address to meson for building vaapi driver with ASan:

meson setup -Db_sanitize=address build

Patch test.sh with:

# Allow an ASan enabled library to be loaded by a non-Asan-enabled executable, i.e., Firefox
# https://github.com/google/sanitizers/issues/796#issuecomment-294388904
export LD_PRELOAD=/usr/lib/libasan.so
# Ensure that CUDA calls work with ASan enabled
# https://github.com/google/sanitizers/issues/629#issuecomment-161357276
export ASAN_OPTIONS=verify_asan_link_order=0,protect_shadow_gap=0

valgrind

Call mpv in test.sh with:

valgrind -q --tool=none --trace-children=yes --track-fds=yes --fullpath-after= mpv --msg-color=no --hwdec=vaapi --gpu-debug --hwdec-codecs=all --vd-lavc-check-hw-profile=no "$@"