PyAV-Org / PyAV

Pythonic bindings for FFmpeg's libraries.
https://pyav.basswood-io.com/
BSD 3-Clause "New" or "Revised" License
2.53k stars 366 forks source link

logging framework can lead to deadlocks #751

Closed jlaine closed 6 months ago

jlaine commented 3 years ago

Currently PyAV wires up FFmpeg's logging so that messages are sent to Python's traditional loggers instead of going straight to stderr.

One such example is simply trying to open a CodecContext for the h264_omx (hardware accelerated H264) codec on a RaspberryPi.

codec = av.CodecContext.create("h264_omx", "w")
codec.width = 640
codec.height = 480
codec.pix_fmt = "yuv420p"
codec.framerate = fractions.Fraction(30, 1)
codec.time_base = fractions.Fraction(1, 30)
codec.open()

This code will hang forever on the codec.open() call. Setting the environment variable PYAV_LOGGING=off completely solves the issue.

@mikeboers I am starting to wonder whether the logging support should be opt-in instead of opt-out, any thoughts on this?

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jlaine commented 2 years ago

This is still an issue.

Breakthrough commented 2 years ago

+1 that this should be opt-in instead of opt-out by default. It would also be very useful to provide some functionality to suppress all log messages by passing a no-op callback rather than relying on log levels.

I'm not very familiar with Cython, however I see the logger callback is marked as nogil and uses with gil where required. It's not really clear how this can deadlock though, unless something is already holding the GIL when this function is called and never releases it, or the thread is destroyed/cancelled while holding the GIL (or another lock, which the code does hold - skip_lock - without any timeout).

Would it make sense to call PyGILState_Check here and manually call PyGILState_Ensure rather than using a with gil block? It's not clear to me from reading the Cython docs at least if using with gil is compatible with threads started outside of cython.parallel, nor does it explain if invoking the same callback multiple times from non-Python created threads is supported.

Another potential point of concern is if the thread is destroyed mid-way through callback execution - this could potentially poison the GIL state, or even the skip_lock used to prevent multiple log messages from being emitted. I'm not sure if that can be resolved without using the underlying threading library directly though, e.g. using pthread_cleanup_push. That seems like a bit of a rabbit hole though, so understandable if that's not feasible from Cython. And again, am not that familiar with the internals of this stuff, so please feel free to correct anything I said that might be incorrect.

jlaine commented 2 years ago

You have some excellent question, but I'll be honest we're reaching the limits of my Cython knowledge. Below are some thoughts:

Log verbosity

I don't understand the value of filtering messages inside PyAV at all, it would seem a lot more efficient for av.logging.set_level to directly call av_log_set_level and let FFmpeg do the filtering itself so unwanted messages don't even reach PyAV. @mikeboers ?

EDIT: I was wrong. av_log_set_level only has an effect on FFmpeg's default logging callback, so once we install our own callback we need to filter ourselves.

We could also most likely default to AV_LOG_WARNING since that is in effect the only thing we send through to the Python logger by default (unless I'm mistaken..). Judging by our users' feedback if anything they want less logging not more.

Opt-in vs opt-out

I'd also be in favor of making the logging framework opt-in, but there is one very useful usecase: it's what enables us to raise sensible exceptions. If you have a close look the logging framework captures error messages, which can then be used by err_check.

Based on this my impression is that we probably do need a logging callback to capture errors, but for the rest we could probably stay out of the way and just hand off to FFmpeg's default callback.

jlaine commented 2 years ago

Here's another finding. I tried to cut down the log_callback function in av/logging.pyx as much as possible to see what's causing the hang. The answer is that even this will cause a hang:

cdef void log_callback(void *ptr, int level, const char *format, lib.va_list args) nogil:
    with gil:
        pass

So in short there doesn't seem to be any safe way of running Python in the log callback.

My gut feeling at this stage would be:

Breakthrough commented 2 years ago

Is there any way to repro this on x86 or does it have to be on an RPi? I had a look at the generated Cython code and it seems to generate an Ensure/Release pair as expected, and you can call it multiple times from the same thread, which makes this hang even more puzzling to be honest. I don't know if the driver source code is public or not, so there might be some threads created there, but interestingly, I don't see the codec itself creating any threads: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/omx.c

Are you able to see any log output from that when this happens? What happens if you check lib.Py_IsInitialized() first?

For the related issue I had where the interpreter hung on exit, Py_IsInitialized was sometimes returning 0 after the first time it was called indicating another thread had destroyed the interpreter at that point (or was in the process thereof).

jlaine commented 2 years ago

Is there any way to repro this on x86 or does it have to be on an RPi?

I am reproducing this using the same code as in #909, on Debian, x86_64.

Breakthrough commented 2 years ago

Oh sorry I meant specifically in the case where it hangs on open rather than #909 which hangs only when things are being destroyed.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jlaine commented 2 years ago

Still very much an issue

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

maiermic commented 1 year ago

This is still an issue. It even causes a deadlock if logging is disabled, see https://github.com/aiortc/aiortc/discussions/734#discussioncomment-6069673

hmaarrfk commented 10 months ago

shall this issue be reopened?

jlaine commented 10 months ago

Yes absolutely!

WyattBlue commented 6 months ago

I've fixed this in master now. The logging is now disabled by default (the callback is a no-op).

import av

# No logs whatso ever (no GIL calls)
av.logging.set_level(None)

# You can change the log level on the fly.
# Set to the old default (detailed error messages).
av.logging.set_level(av.logging.VERBOSE)
WyattBlue commented 6 months ago

Here's a basic test

import av
print('Starting test.')
container = av.open('example.mp4')
container.streams.video[0].thread_type = 'AUTO'
# Have to decode at least one frame to trigger the bug.
next(container.decode(video=0))
print('Test complete.')

av 12.0.0 locks up 1/2 of the time av 13.0.0rc1 always works