egnor / pivid

Experimental video code for Linux / Raspberry Pi
MIT License
28 stars 4 forks source link

Hardware bandwidth limits are subtle and tricky to understand/predict #3

Open egnor opened 2 years ago

egnor commented 2 years ago

(splitting off from @pinballpower's comments in https://github.com/egnor/pivid/issues/1#issuecomment-1133759211)

The Pi hardware (as driven by the Linux kernel) has limits on its ability to do hardware composition. These limits are not simple, it's nothing as basic as a count of layers or pixels, it has to do with memory bandwidth and certain intermediate buffers (such as LBM, the Line Buffer Memory). See some discussion here: https://forums.raspberrypi.com/viewtopic.php?p=1996346#p1996346

This of course makes it tricky to use pivid since there's a certain amount of guesswork in figuring out what you can do. (This applies to anything else using the hardware compositor, it's not pivid-specific.) There is an attempt to predict these load factors in the code here: https://github.com/egnor/pivid/blob/40d179ce80aca2ef35d2a9365be80ddb26960993/display_output.cpp#L787

However, it's still difficult to work with. If nothing else, we should try to document our best understanding of the limits, and encourage the Pi folks to give some sort of semi-official spec. @pinballpower, was the code successful in issuing warnings coincident with the times you were seeing blank screens? If not, it would be great to figure out what limits are being exceeded that the model is missing.

pinballpower commented 2 years ago

I'm not sure about the warnings, but I'll have a look and report back.

egnor commented 2 years ago

By the way, as a hot tip, if you run at 30fps instead of 60fps (eg. "mode": [1920, 1080, 30] in the play script), you'll cut your bandwidth usage in half and thus double the available scene complexity (LBM aside). However I find a lot of HDMI screens don't like 30fps output, which is technically nonstandard. YMMV!

pinballpower commented 2 years ago

Looks like I get either

OVERLOAD HDMI-1 1

or

HDMI-1 outran buffer

To better integrate this it would be good to have an API call that could be used to access the "predict_cost" data. This would allow exteral processes to get a feedback about potential issues.

Next question: It seems the source resolution of the videos matters. In this case, a way to improve performance would be to resize videos to the correct size before playing them rather than resizing them on-the-fly in Pivid. Is this correct? Do you see any advantages/disadvantages of H.264/HVEC?

egnor commented 2 years ago

OVERLOAD: That means it tripped the estimate checker. (It should also print some info on the estimated costs -- mbw, cbw, lbm?) Unfortunately, if you're using video (H.264 or H.265/HEVC), the scaler is always in the picture, because it's used to manage the lower-resolution planes of YUV data (it took me a while to figure this out!), so no need to match the size exactly. But, if you're shrinking it a lot, then you're wasting a lot of memory bandwidth, so it would be good to scale down the original to at least approximately match the playback size.

And yes, agreed, an HTTP request to get the cost estimates for a script is a good idea. Maybe /play could return something with the cost of every layer, and also we could have a "dry run" option that doesn't actually update the script?

outran buffer: That means the decoder (or file I/O) isn't keeping up, it isn't related to DRM. That's different and might be possible to address with more buffering or tuning various parameters. To figure out what's going on, we'll need to turn on some debugging flags and see what is actually happening... making that more friendly is challenging but probably necessary. In the meantime you can send me logs of what you're doing (in as minimal an example of what you can contrive) with --log=loader=debug (on the server).

Codec choice won't affect display hardware (OVERLOAD) but will affect decoding time and thus outran buffer errors. H.265/HEVC is quite a bit more efficient than H.264 (for a given resolution and quality level), I recommend using it if you can. If you're seeking around, having key frames in the source near where you'll seek to is very helpful (otherwise it has to go to the previous key frame and decode forward, which takes a while and risks an outrun).

pinballpower commented 2 years ago

Great. This already helps a lot. I first need to do some further tests here. This will probably gonna take a while.

Meanwile I created some more feature requests to discuss further ideas. That doesn't mean you should start working on this ;-) but should be used as a discussion what features can be implemented and what might not be possible at all (for various reasons as I simply didn't have a real close look at the internals).

egnor commented 2 years ago

I'm taking it all with an open mind! And take your time, I'm not going anywhere :-)