awslabs / amazon-kinesis-video-streams-producer-sdk-cpp

Amazon Kinesis Video Streams Producer SDK for C++ is for developers to install and customize for their connected camera and other devices to securely stream video, audio, and time-encoded data to Kinesis Video Streams.
Apache License 2.0
373 stars 336 forks source link

Weird FPS when audio is present #562

Closed rgroleau closed 3 years ago

rgroleau commented 4 years ago

Hi,

When I use the kinesis gstreamer plugin with the following video-only command, it works well:

gst-launch-1.0 -e quectelmipisrc device=/dev/video0 do-timestamp=true \
! video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 \
! queue ! omxh264enc control-rate=3 target-bitrate=3000000 \
quant-b-frames=0 ! video/x-h264,profile=main \
! queue ! h264parse pts-interpolation=true \
! queue ! kvssink name=k aws-region=us-east-1 \
access-key=MYSECRETACCESSID \
secret-key=MYSECRETACCESSKEY \
stream-name=my-specific-stream-name

Notice there is no audio in that pipeline. The debug output reports stats like: >> Current frame rate (fps): 24.5007

First off, that FPS is never quite the 30 fps capture rate that is actually happening at the sensor driver level, which is confusing, especially since the FPS stat is varying anywhere between 18 to 26 fps over a few seconds, but at least this works (I can watch the HLS stream properly on any browser frontend).

Then, and here is my real important question, when I add audio to the pipeline, it does not work exactly well, here is my command line:

gst-launch-1.0 -e quectelmipisrc device=/dev/video0 do-timestamp=true \
! video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 \
! queue ! omxh264enc control-rate=3 target-bitrate=3000000 \
quant-b-frames=0 ! video/x-h264,profile=main \
! queue ! h264parse pts-interpolation=true \
! queue ! kvssink name=k aws-region=us-east-1 \
access-key=MYSECRETACCESSID \
secret-key=MYSECRETACCESSKEY \
stream-name=my-specific-stream-name \
qahwsrc ! audio/x-raw,format=S16LE,channels=1,rate=44100 \
! queue ! audioconvert \
! audio/x-raw,format=F32LE,channels=1,rate=44100 \
! queue ! avenc_aac bitrate=44100 \
! queue ! k.

And the resulting debug output stats: >> Current frame rate (fps): 18954.9

Notice how the FPS went through the roof? 18K FPS! Surprisingly this still works on some HLS frontends, although the sound is messed up.

I am sure I am doing something very simple, very wrong here, if anybody can point me in the right direction, it would be greatly appreciated!

Note that this is on an embedded ARM-based device with a Qualcomm DSP, so the v4l2src video source is actually quectelmipisrc, and the alsasrc audio source is actually qahwsrc.

Also note that this pipeline works fine when using an mp4mux sink instead of a kvssink, and the resulting MP4 video container and audio channels are fine (mp4mux FPS is also fine). I am just wondering how to adapt it properly to the kvssink for live streaming.

Thanks, Rejean

disa6302 commented 3 years ago

@rgroleau ,

Can you check if your pipeline works fine with fakesink?

rgroleau commented 3 years ago

Hi @disa6302,

Yes, fakesink works perfectly well. Even the splitmuxsink and mp4mux sinks work perfectly well. Could it just be related to this: https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp/issues/326

Apparently the FPS reported by kvssink is just "normally that bad".

Here is an example pipieline that works on a normal linux laptop, and that also reports insanely high FPS:

gst-launch-1.0 v4l2src device=/dev/video0 ! videoconvert \
! x264enc bframes=0 key-int-max=45 bitrate=512 tune=zerolatency ! video/x-h264,profile=main \
! h264parse ! video/x-h264,stream-format=avc,alignment=au,profile=main \
! kvssink name=sink storage-size=512 aws-region=aws-region=us-east-1 \
access-key=MYACCESSID secret-key=MYACCESSKEY stream-name=my-stream-name \
alsasrc device=hw:0,0 ! audioconvert ! audio/x-raw,format=S16LE,channels=1,rate=44100 \
! voaacenc ! sink.

and the result from the stdout of this command is:

    >> Total view allocation byte size: 576080
    >> Total streams frame rate (fps): 100
    >> Total streams transfer rate (bps): 33554432 (32768 Kbps)
[...]
    >> Total view allocation byte size: 576080
    >> Total streams frame rate (fps): 741886
    >> Total streams transfer rate (bps): 27407104 (26764 Kbps)
[...]
    >> Total view allocation byte size: 576080
    >> Total streams frame rate (fps): 899963
    >> Total streams transfer rate (bps): 20392656 (19914 Kbps)

So yeah, even on a normal PC, when I add audio statements the FPS is really completely insanely high. So I will ignore the erroneous FPS reporting from the kvsink Total stats stdout printouts.

MushMal commented 3 years ago

Frame rate is a first derivative of a function of the number of produced frames over time range. The time is calculated whenever the frames are produced as the frame rate is calculated whenever the frame is produced. So, if your pipeline is producing for example 10 frame a second but all of the frames are produced in a burst in a short period say 10 milliseconds then when querying the frame rate right after the frames are produced will yield 100x of the actual frame rate if those were spread equally over the duration of 1 second. There is also a curve flattening that we apply called EMA (exponential mean averaging) which adds inertia to the jumps in a value that will try to smooth the curve but not when the produced frames are in extreme bursts.

At any rate, check your pipeline for burst. In your case your pipeline has queue element that acts as a buffer to collect the frames and release in a burst.

This is by design in that case. Resolving.

rgroleau commented 3 years ago

Although this is by design, could you at least mark it as a duplicate of the eventual ticket/feature/enhancement that will implement the use a periodic (relative) timestamp (or any other mechanism) that will enable the display of valid stats when using queues?

Queues are an integral part of gstreamer after all, and are often required, even recommended by your team as alternative solutions to specific pipelines (including audio).

MushMal commented 3 years ago

@rgroleau we need to think a little deeper about this. The metric simply reports the observed putframe frequency at a given interval. As such, it is the first order derivative function which has a time interval factored in. There are two cases to consider.

1) If the stream is uploaded in an offline mode. Here, the putframe is called at the CPU/IO pipeline speed. While the FPS on the media is normal (ex: 30 fps) the observed frame rate from the perspective of KVS will be multiple thousands of frames per second. 2) If the media pipeline is "bursty" - aka if you have elements which are queuing the frames and pushing at the higher rate - here again, the metric is about observed frame rate that's produced into the KVS SDK itself and not your media pipeline frame rate which can be different.

For the second issue, the PIC does not have it's source of liveness. It measures the frame rate whenever you produce a frame. If the frames are produced in a burst, it will highly impact the measurement as the delta between the frames will be tiny which will impact the numbers.

The metric is calculated by applying an EMA (Exponential Mean Averaging) which allows to add some "inertia" to the curve to eliminate spikes. However, even with EMA, you won't get the numbers you are looking for (media pipeline numbers) in case of a bursty stream. If the burstyness is minor then playing with the EMA Alpha parameter might provide enough smoothing but it will highly depend on your scenario.

I really don't think this is a good idea to build your application on this metric given your media pipeline is bursty

rgroleau commented 3 years ago

Agreed, but in any way, seeing 8000 (and higher) fps in the statements seems like a problem to anybody looking at it.

So, I understand the live streaming scenario on IoT devices using a DSP-based encoder with audio (and of course, queues, which are "by design" necessary/recommended in many gstreamer pipelines) using the kvssink plugin is not a common use case for you, but saying this is "by design" sounds like "IoT dsp-based live streaming using kvssink is not very well supported", so anything that could alleviate this would help. Maybe just a property to choose between "report stats using dynamic EMA" vs "report stats using fixed periods" with the two caveats (about offline streaming and all) in the description could be enough? This is clearly not a priority, I know. But it would be a welcome enhancement for dsp-based IoT platforms using kvssink with audio.

MushMal commented 3 years ago

In offline case there isn't much we could do - it really is the observed frame rate that's being produced into the KVS SDK.

In the case of bursty pipeline like you've mentioned, we could change the parameters in the EMA calculations. Do you want to take a lead and provide a PR for it? The EMA parameters are in PIC. Let us know if you want to work on it

disa6302 commented 3 years ago

@rgroleau ,

Thanks for your patience. I have added a new metric called elementaryFrameRate and the CPP SDK is updated with the change as well. Also, the frame rate are calculated for video track only.

Feel free to reopen this issue if you have any questions / face any issues.

rejeangroleau commented 3 years ago

I tried it this morning (after your merge operations): it works great, the results look like:

2021-03-26 11:44:40 [2803668064] DEBUG - Kinesis Video client and stream metrics
    >> Overall storage byte size: 134217728
    >> Available storage byte size: 134191328
    >> Allocated storage byte size: 26400
    >> Total view allocation byte size: 576080
    >> Total streams elementary frame rate (fps): 1374
    >> Total streams transfer rate (bps): 306312 (299 Kbps)
    >> Current view duration (ms): 852
    >> Overall view duration (ms): 1052
    >> Current view byte size: 10274
    >> Overall view byte size: 23693
    >> Current elementary frame rate (fps): 15
    >> Current transfer rate (bps): 306312 (299 Kbps)

So yes, the Current elementary frame rate (fps): 15 line matches what I was streaming at that point. Thanks!

On a side note, and I guess this is by design, the first time the debug statement is printed (at the very start of the stream) it shows 100 fps for both values (total streams elementary and current elementary frame rates):

2021-03-26 11:47:20 [2934498400] DEBUG - Kinesis Video client and stream metrics
    >> Overall storage byte size: 134217728
    >> Available storage byte size: 134217728
    >> Allocated storage byte size: 0
    >> Total view allocation byte size: 576080
    >> Total streams elementary frame rate (fps): 100
    >> Total streams transfer rate (bps): 33554432 (32768 Kbps)
    >> Current view duration (ms): 0
    >> Overall view duration (ms): 0
    >> Current view byte size: 0
    >> Overall view byte size: 0
    >> Current elementary frame rate (fps): 100
    >> Current transfer rate (bps): 33554432 (32768 Kbps)

But all the other values are at zero anyways, and even the current transfer rate (in bps) is completely off (since our device is clearly not able to upload at 32Mbps, believe me, our modem is not that fast, real world physics apply). So right now we just ignore the first debug statement. Everything else afterwards works great!

Thanks again!