Gnurou / v4l2r

Rust bindings for V4L2
MIT License
22 stars 10 forks source link

Raspberry pi never fires a EPOLLPRI event #7

Open FallingSnow opened 3 years ago

FallingSnow commented 3 years ago

It seems that while using V4L2 on the RPI4 no EPOLLPRI event is ever fired. The stateful decoder in v4l2r requires a EPOLLPRI to read a v4l2_event which will trigger a resolution change, in turn enabling the capture queue.

Is there a way to use the stateful decoder even if no v4l2_event is triggered? Of course you would need to manually set the output format then.

Gnurou commented 3 years ago

What is your kernel version? Maybe your kernel is missing this fix that has landed not so long ago:

https://github.com/torvalds/linux/commit/726daf6bafe9d1d9e5c36e1e2a4008941fbc28bd

IIRC it fixes exactly the problem you are mentioning, which I have also met while testing with vicodec.

Gnurou commented 3 years ago

It is also possible that the RPI4 driver is not compliant with the stateful decoder specification. Is the source for it available?

FallingSnow commented 3 years ago

Does this issue exist on versions before 5.12? I'm using 5.10.25.

Linux alarm 5.10.25-1-ARCH #1 SMP PREEMPT Mon Mar 22 09:22:11 MDT 2021 aarch64 GNU/Linux

It is also possible that the RPI4 driver is not compliant with the stateful decoder specification. Is the source for it available?

That is possible. Let me see.

FallingSnow commented 3 years ago

Seems RPI does support the stateful decode API. https://github.com/mpv-player/mpv/issues/7492#issuecomment-591897864

And there is this V4L2_EVENT_SOURCE_CHANGE line in the driver source (I think).

https://github.com/raspberrypi/linux/blob/7ca8526c5ad5d3467e5e6799787fb3329ceba192/drivers/staging/vc04_services/bcm2835-codec/bcm2835-v4l2-codec.c#L913-L919

FallingSnow commented 3 years ago

I'm not sure if that's the right source code but it does seem to line up with the modules in use on my RPI4 board.

$ lsmod | grep v4l2
bcm2835_v4l2           45056  0
v4l2_mem2mem           40960  1 bcm2835_codec
bcm2835_mmal_vchiq     32768  3 bcm2835_codec,bcm2835_v4l2,bcm2835_isp
videobuf2_vmalloc      20480  1 bcm2835_v4l2
videobuf2_v4l2         32768  4 bcm2835_codec,bcm2835_v4l2,v4l2_mem2mem,bcm2835_isp
videobuf2_common       61440  5 bcm2835_codec,videobuf2_v4l2,bcm2835_v4l2,v4l2_mem2mem,bcm2835_isp
videodev              303104  6 bcm2835_codec,videobuf2_v4l2,bcm2835_v4l2,videobuf2_common,v4l2_mem2mem,bcm2835_isp
mc                     57344  6 videodev,bcm2835_codec,videobuf2_v4l2,videobuf2_common,v4l2_mem2mem,bcm2835_isp
Gnurou commented 3 years ago

Mmm actually it seems like this patch I mentioned fixes a different kind of issue, which is EPOLLIN/EPOLLOUT not firing after the initial EPOLLPRI (so different from your problem). Furthermore there is a workaround for this issue in v4l2r:

https://github.com/Gnurou/v4l2r/blob/master/lib/src/device/poller.rs#L169

So maybe that's a different problem. But if in doubt maybe try to make sure your kernel includes these patches from mainline:

726daf6bafe9 media: v4l2-mem2mem: always call poll_wait() on queues
575c52cc4cae media: videobuf2: always call poll_wait() on queues
1698a7f15112 media: v4l2-mem2mem: simplify poll logic
566463afdbc4 media: v4l2-mem2mem: always consider OUTPUT queue during poll

Which input codec are you trying to decode btw?

It could also be useful if you could provide the output of strace -f on your program, so I can make sure the driver is set up properly. Maybe the source of your program itself as well, especially if you are using the lower-level APIs of this crate.

FallingSnow commented 3 years ago

Here are the strace logs: https://github.com/FallingSnow/v4l2-pi/tree/master/logs. The v4l2-pi-stateful-strace.log log is from the program in the /stateful directory. The other is from the program in the /src directory.

The code can be found in that repository as well.

So maybe that's a different problem. But if in doubt maybe try to make sure your kernel includes these patches from mainline:

726daf6bafe9 media: v4l2-mem2mem: always call poll_wait() on queues
575c52cc4cae media: videobuf2: always call poll_wait() on queues

Both of those are actually only in the 5.12.y branch, so I don't think I'd have those patches. I do have the other 2 though.

I actually saw your workaround while trying to debug this issue. I have a C version of this program that works and found that the RPI doesn't trigger any v4l2_events in the C version either. Instead that version just listens to EPOLLOUT from the start.

Which input codec are you trying to decode btw?

H264

FallingSnow commented 3 years ago

I don't know the V4L2 stateful decoder spec but if it requires you to fire a V4L2_EVENT_SOURCE_CHANGE then I think RPI4 would be out of spec... Unless I'm doing something wrong :disappointed_relieved:

FallingSnow commented 3 years ago

@6by9 Do you know if the RPI4 is supposed to submit a source change event when using the V4L2_mem2mem H264 decoder?

6by9 commented 3 years ago

Yes it creates source changed events should the resolution on the CAPTURE queue change. If you program the CAPTURE queue in accordance with the stream, then there is no change, and no event will be generated.

However currently we don't support enabling only the OUTPUT queue and expecting SOURCE_CHANGED events. The underlying code has an input and output port, corresponding to OUTPUT and CAPTURE queues. At present the ports are enabled with the queue STREAMON state. The SOURCE_CHANGED event comes from the output port, so if it hasn't had STREAMON called then it's disabled.

I had patches largely working to enable both ports based on the OUTPUT queue state, but seeing as neither FFmpeg nor GStreamer as the main users of the API enable only one stream at a time, the priority on polishing them up was low.

edit: From discussion with others in the mainline Linux kernel community, I believe other platforms also have a similar restriction.

Gnurou commented 3 years ago

Thanks for the logs! So one thing first, your program in src must subscribe to the SourceChange event, otherwise it won't receive the resolution change event even if the driver sends it.

Another thing that surprises me is that with both programs the driver returns the OUTPUT buffers that you are queuing, even though it doesn't produces any frame (as it cannot, since the CAPTURE queue is not set up). I would expect the OUTPUT queue to starve in that case, as the driver waits for an available CAPTURE buffer to decode into, and only advances to the next OUTPUT buffer when this is done.

I wonder if the problem doesn't come from the way the H.264 stream is segmented. Most (all?) drivers should expect 1 full NAL unit per buffer, I need to check if your program does that...

Gnurou commented 3 years ago

However currently we don't support enabling only the OUTPUT queue and expecting SOURCE_CHANGED events.

A stateful decoder should be able to generate that event per the specification: https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-decoder.html#initialization

Setting up the CAPTURE queue prior to start sending the stream is kind of supported as an alternative behavior, but the expectation is that for stream that contain resolution information the driver will send an initial source change event.

If the RPI4 driver cannot be updated to comply, maybe we need to add an extra flag to the queue so user-space knows it is expected to parse the stream and setup the CAPTURE queue itself... but that would complicate things.

6by9 commented 3 years ago

The Pi has a bitstream parsing FIFO which IIRC is 2MB in size. It'll swallow OUTPUT buffers until that FIFO is full.

V4L2 mandates that there is one NAL per buffer https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/pixfmt-compressed.html#compressed-formats

V4L2_PIX_FMT_H264 ‘H264’ H264 Access Unit. The decoder expects one Access Unit per buffer. The encoder generates one Access Unit per buffer. If ioctl VIDIOC_ENUM_FMT reports V4L2_FMT_FLAG_CONTINUOUS_BYTESTREAM then the decoder has no requirements since it can parse all the information from the raw bytestream.

Our decoder could set V4L2_FMT_FLAG_CONTINUOUS_BYTESTREAM, but there is a performance gain and latency reduction if framed data is passed in. AFAIK there is no way for the client to tell the decoder whether the stream is framed or not, so it can't be the client's decision.

Certainly for encoders there is the expectation is that any header bytes are delivered in same buffer as the I/IDR frame that follows. To my mind this contradicts the spec, but does mean that simple implementations can work on a 1-in, 1-out basis.

6by9 commented 3 years ago

A stateful decoder should be able to generate that event per the specification: https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-decoder.html#initialization

Setting up the CAPTURE queue prior to start sending the stream is kind of supported as an alternative behavior, but the expectation is that for stream that contain resolution information the driver will send an initial source change event.

If the RPI4 driver cannot be updated to comply, maybe we need to add an extra flag to the queue so user-space knows it is expected to parse the stream and setup the CAPTURE queue itself... but that would complicate things.

Chromium was the driver for starting on the patches, but I believe they have changed approach in the latest releases and hence the change of priority.

The CAPTURE queue doesn't need to be configured correctly based on the bitstream headers, but it does need to have STREAMON called.

Gnurou commented 3 years ago

Chromium was the driver for starting on the patches, but I believe they have changed approach in the latest releases and hence the change of priority.

Chromium definitely relies on the initial resolution change event, so I believe that will need to be fixed at some point.

The CAPTURE queue doesn't need to be configured correctly based on the bitstream headers, but it does need to have STREAMON called.

Oh, so just calling STREAMON on the CAPTURE queue in its initial state should be enough, right? Maybe @FallingSnow can try to insert a call right after the OUTPUT queue starts streaming in v4l2r and see if this unblocks the situation, as a workaround until the driver is fixed.

FallingSnow commented 3 years ago

@Gnurou

Another thing that surprises me is that with both programs the driver returns the OUTPUT buffers that you are queuing, even though it doesn't produces any frame (as it cannot, since the CAPTURE queue is not set up). I would expect the OUTPUT queue to starve in that case, as the driver waits for an available CAPTURE buffer to decode into, and only advances to the next OUTPUT buffer when this is done.

I set it non blocking mode for the capture queue and, as @6by9 noted, the output buffer now proceeds even without any capture frames being dequeued until the RPI4 chip's buffer has be filled. Then it just hangs as neither the RPI4 want's more output buffers nor have any capture buffers been dequeued. At this point the output queue blocks and waits. So the output queue is working correctly.

I wonder if the problem doesn't come from the way the H.264 stream is segmented. Most (all?) drivers should expect 1 full NAL unit per buffer, I need to check if your program does that...

The programs create "frames" by segmenting based on H264 access unit delimiters. So the driver should be receiving full NAL units.

So the situation seems this...

I will try this. @Gnurou will this mess with the internal state of the v4l2r::stateful::Decoder?

@6by9 just so I'm clear, based on the statement below from the spec (linked by @Gnurou above)

However, for coded formats that include stream resolution information, after the decoder is done parsing the information from the stream, it will update the CAPTURE format with new values and signal a source change event, regardless of whether they match the values set by the client or not.

RPI4 must trigger a V4L2_EVENT_SOURCE_CHANGE event no matter what. And patches were started to fix this but priorities changed?

FallingSnow commented 3 years ago

Actually both queues are already STREAMONed, as this is how my C program works.

https://github.com/FallingSnow/v4l2-pi/blob/8ee9ab4863918db325f792c4d01f1e98e817f83a/src/main.rs#L208-L213

:confused:

Btw, if you're interested, this is the horrible piece of code that creates the "framed" data.

https://github.com/FallingSnow/v4l2-pi/blob/8ee9ab4863918db325f792c4d01f1e98e817f83a/src/main.rs#L249-L269

6by9 commented 3 years ago

Kernel module bcm2835-codec has a module parameter "debug". Increase that to 5 to get all the logging out of the module. You should get any format changed callbacks logged via https://github.com/raspberrypi/linux/blob/rpi-5.10.y/drivers/staging/vc04_services/bcm2835-codec/bcm2835-v4l2-codec.c#L958

The event is V4L2_EVENT_SOURCE_CHANGE with V4L2_EVENT_SRC_CH_RESOLUTION. If there are no changes then there should be no event. Or otherwise there is no way for a client that has parsed the headers and can set the CAPTURE queue to avoid going through dynamic allocation change process. I'd want to check again that FFmpeg and GStreamer don't rely on being able to set the CAPTURE queue format - they certainly used not to support V4L2_EVENT_SOURCE_CHANGE.

https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-decoder.html#initialization

If the client has not set the coded resolution of the stream on its own, calling VIDIOC_G_FMT(), VIDIOC_S_FMT(), VIDIOC_TRY_FMT() or VIDIOC_REQBUFS() on the CAPTURE queue will not return the real values for the stream until a V4L2_EVENT_SOURCE_CHANGE event with changes set to V4L2_EVENT_SRC_CH_RESOLUTION is signaled.

So the client is entitled to set the coded resolution on the stream via the CAPTURE queue.

Also the note

If the values configured by the client do not match those parsed by the decoder, a Dynamic Resolution Change will be triggered to reconfigure them.

So the client isn't going to set that format after an initial V4L2_EVENT_SOURCE_CHANGE event, therefore this can only reflect the initialisation V4L2_EVENT_SOURCE_CHANGE event.

FallingSnow commented 3 years ago

I had only been using debug=1, I'll use 5 now.

Or otherwise there is no way for a client that has parsed the headers and can set the CAPTURE queue to avoid going through dynamic allocation change process.

I don't understand this part. Can't the client just choose to ignore the event?

I understand there is a discrepancy between whether a v4l2 decoder has to trigger a V4L2_EVENT_SOURCE_CHANGE upon parsing that data. But it makes sense, given the name, that a V4L2_EVENT_SOURCE_CHANGE would only happen if the format set by the client does not match that of the decoder.

Then I guess since RPI4 does not trigger the event if the capture queue is "off", then v4l2r needs to STREAMON the capture queue as well.

right @Gnurou?

FallingSnow commented 3 years ago

@Gnurou

Thanks for the logs! So one thing first, your program in src must subscribe to the SourceChange event, otherwise it won't receive the resolution change event even if the driver sends it.

How would I event listen for the event using v4l2r? I see subscribe_event but not a way to check for events.

FallingSnow commented 3 years ago

@6by9 I'm decoding a 1920x1080 video and the format change never appears in the logs. Now I'm wondering if I'm even passing the data to the decoder correctly.

The dmesg: https://pastebin.com/PTXr1fQM

Mediainfo of FPS_test_1080p60_L4.2.h264 file:

Video
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4.2
Format settings                          : CABAC / 4 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 4 frames
Width                                    : 1 920 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Frame rate                               : 60.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Writing library                          : x264 core 142 r2431 ac76440
Encoding settings                        : cabac=1 / ref=4 / deblock=1:-1:-1 / analyse=0x3:0x133 / me=umh / subme=10 / psy=1 / psy_rd=1.00:0.15 / mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fas
t_pskip=1 / chroma_qp_offset=-3 / threads=6 / lookahead_threads=1 / sliced_threads=0 / slices=4 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=8 / b_pyramid=2 / b_adapt=2 / b_bias=0 / direct=3 / weight
b=1 / open_gop=1 / weightp=2 / keyint=60 / keyint_min=6 / scenecut=40 / intra_refresh=0 / rc_lookahead=60 / rc=crf / mbtree=1 / crf=18.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / vbv_maxrate=40000 / vbv_bufsize=30000 / crf_max=0.0 / n
al_hrd=none / filler=0 / ip_ratio=1.40 / aq=1:1.00
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709
Gnurou commented 3 years ago

How would I event listen for the event using v4l2r? I see subscribe_event but not a way to check for events.

You can use ioctl::dqevent() to check if an event was pending and dequeue it. This can be adequate if you only use a single thread, but a better design is probably to use the Poller struct to listen to events and pending buffers.

Gnurou commented 3 years ago

I will try this. @Gnurou will this mess with the internal state of the v4l2r::stateful::Decoder?

The decoder will stream the queue on itself after it receives a resolution change event, but the fact that it was already streaming should not be an issue. What is more worrying is that you will have to allocate buffers in order to to able to stream the queue in the first place, so you'll probably want to call the reqbufs (with MMAP buffers) and streamon ioctls directly.

Then I guess since RPI4 does not trigger the event if the capture queue is "off", then v4l2r needs to STREAMON the capture queue as well. right @Gnurou?

The fact that RPI4 does not trigger the event if the capture queue is off is not compliant with the spec. It should trigger that event, and I don't see anything that prevents it from doing so. So preferably the issue would be fixed in the kernel driver itself.

Gnurou commented 3 years ago

Actually both queues are already STREAMONed, as this is how my C program works.

Do you have a C program that works with this driver? It would be interesting to see its strace output and see if we cannot tune the Rust code to do the same thing.

FallingSnow commented 3 years ago

The fact that RPI4 does not trigger the event if the capture queue is off is not compliant with the spec. It should trigger that event, and I don't see anything that prevents it from doing so. So preferably the issue would be fixed in the kernel driver itself.

Yeah, that should be fixed...

C program's strace, strace -f ./decode 2> /tmp/c-v4l2-strace.log: https://pastebin.com/eRYiA4ZA

You can ignore the udp event, queue_size = 10 lines, my program takes UDP packets as input, and I have to resend it a couple times before it get all the data (until I implement FEC).

What is more worrying is that you will have to allocate buffers in order to to able to stream the queue in the first place, so you'll probably want to call the reqbufs (with MMAP buffers) and streamon ioctls directly.

Ok, well in this case I'll stick with getting the low level functions working before moving to the stateful decoder.

FallingSnow commented 3 years ago

I've been doing some looking in the kernel log and in the C program, the driver out puts the following

[  +0.009879] bcm2835-codec bcm2835-codec: bcm2835_codec_start_streaming: type: 10 count 0
[  +0.004389] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_queue: type: 9 ptr 00000000d59e583b vbuf->flags 0, seq 0, bytesused 4177920
[  +0.000009] bcm2835-codec bcm2835-codec: bcm2835_codec_start_streaming: type: 9 count 1
[  +0.000155] bcm2835-codec bcm2835-codec: device_run: off we go
[  +0.000139] bcm2835-codec bcm2835-codec: device_run: Submitted src 0000000000000000, dst 00000000d59e583b
[  +2.388681] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_prepare: type: 10 ptr 00000000c2db1bb7
[  +0.000054] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_queue: type: 10 ptr 00000000c2db1bb7 vbuf->flags 0, seq 0, bytesused 786432
[  +0.000018] bcm2835-codec bcm2835-codec: device_run: off we go
[  +0.000258] bcm2835-codec bcm2835-codec: device_run: Submitted ip buffer len 786432, pts 0, flags 0004
[  +0.000057] bcm2835-codec bcm2835-codec: device_run: Submitted src 00000000c2db1bb7, dst 0000000000000000
[  +0.024081] bcm2835-codec bcm2835-codec: ip_buffer_cb: port 0000000047d75b3f buf 000000006eee54f1 length 0, flags 0
[  +0.000016] bcm2835-codec bcm2835-codec: ip_buffer_cb: no error. Return buffer 00000000c2db1bb7
[  +0.000017] bcm2835-codec bcm2835-codec: ip_buffer_cb: done 1 input buffers
[  +1.942382] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_prepare: type: 10 ptr 00000000ba7b6a02
[  +0.000058] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_queue: type: 10 ptr 00000000ba7b6a02 vbuf->flags 0, seq 0, bytesused 786432
[  +0.000017] bcm2835-codec bcm2835-codec: device_run: off we go
[  +0.000375] bcm2835-codec bcm2835-codec: device_run: Submitted ip buffer len 786432, pts 0, flags 0004
[  +0.000015] bcm2835-codec bcm2835-codec: device_run: Submitted src 00000000ba7b6a02, dst 0000000000000000
[  +0.006511] bcm2835-codec bcm2835-codec: ip_buffer_cb: port 0000000047d75b3f buf 00000000851dd6cb length 0, flags 0
[  +0.000008] bcm2835-codec bcm2835-codec: ip_buffer_cb: no error. Return buffer 00000000ba7b6a02
[  +0.000011] bcm2835-codec bcm2835-codec: ip_buffer_cb: done 2 input buffers
[  +0.034613] bcm2835-codec bcm2835-codec: op_buffer_cb: status:0, buf:0000000034ed926f, length:96, flags 0, pts -9223372036854775808
[  +0.000022] bcm2835-codec bcm2835-codec: handle_fmt_changed: Format changed: buff size min 4177920, rec 4177920, buff num min 1, rec 1
[  +0.000013] bcm2835-codec bcm2835-codec: handle_fmt_changed: Format changed to 1920x1088, crop 1920x1080, colourspace 00000000
[  +0.000008] bcm2835-codec bcm2835-codec: handle_fmt_changed: Format was 3840x1088, crop 1920x1080

However when doing the same with the rust program the handle_fmt_changed lines never show up. Does this mean I'm not giving the data to the driver correctly?

For reference, running the rust program:

[  +0.000229] bcm2835-codec bcm2835-codec: bcm2835_codec_start_streaming: type: 10 count 0
[  +0.002316] bcm2835-codec bcm2835-codec: bcm2835_codec_start_streaming: type: 9 count 0
[  +0.000263] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_init: ctx:000000004ad25ac9, vb 00000000dac9fbdf
[  +0.000007] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_prepare: type: 9 ptr 00000000dac9fbdf
[  +0.000007] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_queue: type: 9 ptr 00000000dac9fbdf vbuf->flags 0, seq 0, bytesused 614400
[  +0.000007] bcm2835-codec bcm2835-codec: device_run: off we go
[  +0.000100] bcm2835-codec bcm2835-codec: device_run: Submitted src 0000000000000000, dst 00000000dac9fbdf
[  +0.105459] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_prepare: type: 10 ptr 0000000000ddbc9b
[  +0.000048] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_queue: type: 10 ptr 0000000000ddbc9b vbuf->flags 0, seq 0, bytesused 30
[  +0.000016] bcm2835-codec bcm2835-codec: device_run: off we go
[  +0.000213] bcm2835-codec bcm2835-codec: device_run: Submitted ip buffer len 30, pts 0, flags 0004
[  +0.000012] bcm2835-codec bcm2835-codec: device_run: Submitted src 0000000000ddbc9b, dst 0000000000000000
[  +0.000466] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_init: ctx:000000004ad25ac9, vb 00000000aac09dc7
[  +0.000019] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_prepare: type: 9 ptr 00000000aac09dc7
[  +0.000013] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_queue: type: 9 ptr 00000000aac09dc7 vbuf->flags 0, seq 0, bytesused 614400
[  +0.000013] bcm2835-codec bcm2835-codec: device_run: off we go
[  +0.000198] bcm2835-codec bcm2835-codec: device_run: Submitted src 0000000000000000, dst 00000000aac09dc7
[  +0.001880] bcm2835-codec bcm2835-codec: ip_buffer_cb: port 0000000047d75b3f buf 000000005afc3164 length 0, flags 0
[  +0.000014] bcm2835-codec bcm2835-codec: ip_buffer_cb: no error. Return buffer 0000000000ddbc9b
[  +0.000012] bcm2835-codec bcm2835-codec: ip_buffer_cb: done 1 input buffers
[  +0.106086] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_prepare: type: 10 ptr 000000000f165738
[  +0.000049] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_queue: type: 10 ptr 000000000f165738 vbuf->flags 0, seq 0, bytesused 4671
[  +0.000016] bcm2835-codec bcm2835-codec: device_run: off we go
[  +0.000223] bcm2835-codec bcm2835-codec: device_run: Submitted ip buffer len 4671, pts 0, flags 0004
[  +0.000013] bcm2835-codec bcm2835-codec: device_run: Submitted src 000000000f165738, dst 0000000000000000
[  +0.000202] bcm2835-codec bcm2835-codec: ip_buffer_cb: port 0000000047d75b3f buf 000000008bdf9dd3 length 0, flags 0
[  +0.000010] bcm2835-codec bcm2835-codec: ip_buffer_cb: no error. Return buffer 000000000f165738
[  +0.000011] bcm2835-codec bcm2835-codec: ip_buffer_cb: done 2 input buffers
[  +0.107126] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_prepare: type: 10 ptr 0000000000ddbc9b
[  +0.000024] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_queue: type: 10 ptr 0000000000ddbc9b vbuf->flags 0, seq 0, bytesused 1014
[  +0.000016] bcm2835-codec bcm2835-codec: device_run: off we go
[  +0.000033] bcm2835-codec bcm2835-codec: device_run: Submitted ip buffer len 1014, pts 0, flags 0004
[  +0.000089] bcm2835-codec bcm2835-codec: device_run: Submitted src 0000000000ddbc9b, dst 0000000000000000
[  +0.000099] bcm2835-codec bcm2835-codec: ip_buffer_cb: port 0000000047d75b3f buf 000000005afc3164 length 0, flags 0
[  +0.000010] bcm2835-codec bcm2835-codec: ip_buffer_cb: no error. Return buffer 0000000000ddbc9b
[  +0.000010] bcm2835-codec bcm2835-codec: ip_buffer_cb: done 3 input buffers
[  +0.107430] bcm2835-codec bcm2835-codec: bcm2835_codec_buf_prepare: type: 10 ptr 000000000f165738
FallingSnow commented 3 years ago

I think I've got it work. Will update you soon.

Gnurou commented 3 years ago

Oho, nice! Any hint on what the issue was?

I'm not against the idea of adding per-device workarounds to the decoder if they can be properly separated into their own source file.

FallingSnow commented 3 years ago

Yeah, the issue was exactly what we thought it was. The raspberry pi doesn't signal a V4L2_EVENT_SOURCE_CHANGE when the CAPTURE queue is not streamon. This effective deadlocks the stateful decoder.

It seems a "temporary" (praying this gets fixed up stream someday) workaround would be necessary.

We could do something like

const DECODE_DEVICE_PATH: &'static str = "/dev/video10";
let decode_path = Path::new(DECODE_DEVICE_PATH);
let device = Device::open(decode_path, DeviceConfig::new())
        .expect("Failed to open device");

let caps = &device.capability;
if caps.driver == "bcm2835-codec-decode" {
    // Raspberry Pi Workaround: capture queue must be on to trigger V4L2_EVENT_SOURCE_CHANGE
    // decoder.capture_queue.stream_on().expect("Failed to start capture_queue");
}
6by9 commented 3 years ago

I'll dig out my patches again.

Is there a quick and easy test (for someone who has never used rust at all) that I can run?

Gnurou commented 3 years ago

I just happen to have acquired a Raspberry Pi 4, for unrelated reasons (fantastic little device btw). Maybe I can take a look at how to fix the kernel driver.

Is there a quick and easy test (for someone who has never used rust at all) that I can run?

I am not familiar with cross-compiling Rust yet, but @FallingSnow can probably share some instructions. Thanks to Cargo it should not take much more than 3 or 4 commands to get the Rust toolchain installed and build an ARM binary. If that's a hassle, I'll be happy to build a custom kernel and test.

FallingSnow commented 3 years ago

@6by9 If you can install rust I'll setup a minimal repo tomorrow (it's 3am here) where all you should need to do is run cargo test and it'll either pass or fail. If you can't wait,

  1. Install rust
  2. git clone https://github.com/Gnurou/v4l2r.git
  3. cargo run --example vicodec_test -- /dev/video10

You might need to remove this too if I remember correctly https://github.com/Gnurou/v4l2r/blob/1fbc84fbe2ae9db970c41fbdb4537b3753d146e1/lib/examples/vicodec_test/ioctl_api.rs#L41-L46 Also there could be more you need to do to get it to run on RPI.

@Gnurou You don't need to cross compile, you can compile right on the RPI (cargo run). Otherwise if you want to cross compile this makes it super duper easy: https://github.com/rust-embedded/cross.

cargo install cross
cross build --target aarch64-unknown-linux-gnu
# Then copy target/aarch64-unknown-linux-gnu/debug/executable to your raspberry pi

Note: If you're using an arm7 (not aarch64) raspberry pi distro, the command is cross build --target arm-unknown-linux-gnueabihf

Lemme know if you have any trouble.

6by9 commented 3 years ago

@Gnurou It's a little ugly as the codec is remoted to the VPU (VideoCore's processor) over an RPC with the MMAL API. It's MMAL that has the issue of wanting the logical output (V4L2 CAPTURE) stream active to create the event. Working around that restriction is a bit of a pain, particularly if you don't know the MMAL API.

I've pushed the patch I did have to https://github.com/6by9/linux/tree/rpi-5.10.y-codecs - compile testing it now. Ignore the couple of patches for adding interlace support - they should have no real effect.

@FallingSnow I'll give it a 5 minute attempt, but otherwise happy to wait.

FallingSnow commented 3 years ago

@Gnurou I'm trying to setup a test repo for 6by9 but I'm running into a failed error here and I can't figure out why. Do you know what I'm doing wrong?

https://github.com/FallingSnow/rpi-v4l2r-test/blob/13751f69e9c65b7c52feadff59293fc1016c63e2/src/main.rs#L131

@6by9 Maybe you can comment out that line and try running it on a RPi4 with cargo run. I've gotten it to deadlock like it should but I don't know if it would work correctly if the RPi kernel was patched and didn't deadlock because of the above issue.

6by9 commented 3 years ago

I'm a little confused as to a useful test here.

cargo run --example vicodec_test -- /dev/video10 appears to be trying to run an ENCODE test (to FWHT), but /dev/video10 is the decoder.

Remove the checks for vicodec, replace the references to FWHT with H264, and run against /dev/video11, and it gets so far but panics on

thread 'main' panicked at 'Failed to allocate output buffers: ReqbufsError(InvalidBufferType(VideoOutputMplane, UserPtr))', lib/examples/vicodec_test/device_api.rs:133:10

Understandable as neither encoder nor decoder support userptr as they require contiguous buffers.

Is there any benefit in taking that test further?

FallingSnow commented 3 years ago

Sorry, I only meant for vicodec_test to be a starter if you were interested. You'd have to change a few things to get it to work for our specific use case, a few as you noted would be

Either way I'll try to get you a test you can just run and shouldn't have to modify.

FallingSnow commented 3 years ago

Hmm, seems the issue I was running into (Unable to start capture queue: InvalidQueue(VideoCaptureMplane)) was this:

videobuf2_common: [cap-00000000e70ccd0d] vb2_core_streamon: no buffers have been allocate

So this leads us to:

So a bit of a cyclical dependency.

6by9 commented 3 years ago

https://github.com/FallingSnow/rpi-v4l2r-test runs for me with the normal snippet from Big Buck Bunny that we use as a test (https://github.com/6by9/userland/blob/hello_mmal/host_applications/linux/apps/hello_pi/hello_video/test.h264) and my patch, but it's also just blown up on me on the second run so something isn't quite right (looks like there's a competition initialisation missing).

With your clip it fails to get going which is weird. The codec keeps on accepting the data but never producing any output.

Does this framework frame the data correctly? V4L2 requires one NAL per buffer, but you're feeding in an elementary stream so something will need to parse it.

FallingSnow commented 3 years ago

Good to hear! We're part way there.

...but it's also just blown up on me on the second run so something isn't quite right (looks like there's a competition initialisation missing).

That's likely so. In Gnurou's code samples he usually handles shutdowns gracefully with streamoff calls. I don't. However that has never been an issue before, when I run the code again it just runs like the first time.

What error are you getting when you try to run it a second time?

With your clip it fails to get going which is weird.

That's strange. Although I doubt the read_next_aud function is properly framing the data, it does work for me. read_next_aud does provide 1 nal per buffer and is checked by an assert statement, see https://github.com/FallingSnow/rpi-v4l2r-test/blob/13751f69e9c65b7c52feadff59293fc1016c63e2/src/main.rs#L205

6by9 commented 3 years ago

I've fixed the obvious issue with my tree, but need to have a further think as to whether we're doing things correctly here or not. I've got an error in a WARN_ON that I've added though (lines 1512 and 2652).

There's also an issue with trips videobuf2 into complaining about buffers being left in the active state on stop_streaming that's been outstanding for a while. Never enough time to investigate everything.

SylvainGarrigues commented 1 year ago

Hi, what's the status on this? It seems decoding is running perfectly well: cargo run --example simple_decoder -- test.h264 /dev/video10 --save test.yuv --input_format h264

However, the resulting decoded file doesn't display well: ffplay -f rawvideo -pixel_format yuv420p -video_size 1920x1080 -i test.yuv

Am I missing anything? Did you get to decode an Annex B h264 bitstream with this library?