Closed jlong29 closed 3 years ago
@jlong29 I took the liberty to edit the title and remove the "bug" tag.
This is an interesting case. Generally speaking, the initial call to GetMedia will result in some delay due to the backend mapping the selector and some other processing that happens. As soon as the egress starts, the stream can burst to catch up with the current (head). This is an expected behavior. However, what you are reporting is a continuous burst and pause.
I would suggest to include some latency information from the producer. Enable the debug logs and capture a few minutes of logs while you observe the "burstiness" on the consumer (GetMedia) side. Having the metrics printed out will give us some understanding whether the issue with burstiness is related to the producer.
@jlong29 do you have any update on this issue?
@MushMal I intend to follow up on this issue in the next few days. I am just rather busy at the moment. Please keep this ticket open. If the issue is not due to the Producer and has to do with GetMedia, how should we proceed?
Sounds good, we will keep the issue open, though, by all means, even if the issue is resolved, we can re-open it without any problems.
If the issue proves to be the "consumer" side then we will have to analyze where the latency is and what's causing it. It's hard to tell without any data. Generally speaking, I doubt that this could be related to the backend (though never say never) unless there is a bug as there is plenty of capacity and bandwidth overhead on the backend side.
It is possible that we see dropping of the connection on the producer side with consequent re-streaming from the rollback window which will increase the density of the stream and the latency.
We really need more data to understand your scenario.
From the description you gave where you've mentioned that the streams server side timestamps are also "bursty" it's a tale tale sign of a producer side issue.
@MushMal I was able to collect the logs you asked for me to collect. I'll try to provide the relevant details:
Endpoints
Gstreamer Producer gst-launch-1.0 nvcamerasrc fpsRange="30.0 30.0" ! 'video/x-raw(memory:NVMM), width=(int)320, height=(int)240, format=(string)I420, framerate=(fraction)30/1' ! omxh264enc control-rate=1 bitrate=250000 peak-bitrate=500000 iframeinterval=7 profile=1 ! tee name=t t. ! h264parse ! video/x-h264,stream-format=avc,alignment=au,profile=baseline, framerate=30/1 ! kvssink stream-name="MY-STREAM" buffer-duration=5 storage-size=512 avg-bandwidth-bps=31250 access-key="MY-KEY" secret-key="MY-SECRET" aws-region="us-east-1" t. ! h264parse ! omxh264dec ! nveglglessink -e > AWS-KVS-Test.log
aws cli with mpv and custom C parser aws kinesis-video-media get-media --start-selector StartSelectorType="NOW" --stream-name "MY-STREAM" --endpoint-url MY-URL /dev/stdout | ./build/kvsMkvStreamParserCtrl > >(mpv -v --profile=MY-PROFILE -) 2> >(tee stderr.log >&2)
A custom App I can't share, but that uses the GetMedia API directly. I see the same behavior on the App.
Log Files
Hmm, interesting.. Thanks for the details. I just started looking at your producer side logs. I do see that in general, the network throughput is good. However, I see some interesting pattern here. Somewhere from the middle of the logs (same throughout)
I see a burst of key-frames:
Overall storage byte size: 536870912 Available storage byte size: 536858356 Allocated storage byte size: 12556 Total view allocation byte size: 6080 Total streams frame rate (fps): 34 Total streams transfer rate (bps): 257328 (251 Kbps) Current view duration (ms): 67 Overall view duration (ms): 237 Current view byte size: 4579 Overall view byte size: 12172 Current frame rate (fps): 34.2933 Current transfer rate (bps): 257328 (251 Kbps) 2020-07-02 05:28:53 [547767710208] DEBUG - Kinesis Video client and stream metrics Overall storage byte size: 536870912 Available storage byte size: 536852264 Allocated storage byte size: 18648 Total view allocation byte size: 6080 Total streams frame rate (fps): 33 Total streams transfer rate (bps): 257328 (251 Kbps) Current view duration (ms): 297 Overall view duration (ms): 467 Current view byte size: 10335 Overall view byte size: 17928 Current frame rate (fps): 33.2116 Current transfer rate (bps): 257328 (251 Kbps) 2020-07-02 05:28:54 [547767710208] DEBUG - Kinesis Video client and stream metrics Overall storage byte size: 536870912 Available storage byte size: 536844499 Allocated storage byte size: 26413 Total view allocation byte size: 6080 Total streams frame rate (fps): 32 Total streams transfer rate (bps): 257328 (251 Kbps) Current view duration (ms): 531 Overall view duration (ms): 701 Current view byte size: 17764 Overall view byte size: 25357 Current frame rate (fps): 32.3713 Current transfer rate (bps): 257328 (251 Kbps) 2020-07-02 05:28:54 [547767710208] DEBUG - Kinesis Video client and stream metrics Overall storage byte size: 536870912 Available storage byte size: 536837659 Allocated storage byte size: 33253 Total view allocation byte size: 6080 Total streams frame rate (fps): 31 Total streams transfer rate (bps): 257328 (251 Kbps) Current view duration (ms): 768 Overall view duration (ms): 938 Current view byte size: 24268 Overall view byte size: 31861 Current frame rate (fps): 31.4173 Current transfer rate (bps): 257328 (251 Kbps) 2020-07-02 05:28:54 [547767710208] ERROR - getKinesisVideoStreamData(): operation returned status code: 0x5200002f 2020-07-02 05:28:54 [547466240512] DEBUG - postReadCallback(): Wrote 26598 bytes to Kinesis Video. Upload stream handle: 0 2020-07-02 05:28:54 [547466240512] ERROR - getKinesisVideoStreamData(): operation returned status code: 0x5200002f 2020-07-02 05:28:54 [547466240512] DEBUG - postReadCallback(): Pausing CURL read for upload handle: 0 2020-07-02 05:28:54 [547466240512] DEBUG - Kinesis Video client and stream metrics Overall storage byte size: 536870912 Available storage byte size: 536831340 Allocated storage byte size: 39572 Total view allocation byte size: 6080 Total streams frame rate (fps): 30 Total streams transfer rate (bps): 256248 (250 Kbps) Current view duration (ms): 46 Overall view duration (ms): 1183 Current view byte size: 3653 Overall view byte size: 37844 Current frame rate (fps): 30.6614 Current transfer rate (bps): 256248 (250 Kbps)
As you can see, due to the burst, the networking can't immediately keep up so it takes some time to clear the buffer. For example, initially you have 67ms of duration in this sequence. By the time the 4th key-frame is produced, you have 768ms.
I am not familiar with this media pipeline/platform so you might want to play a little to ensure the encoder doesn't produce "corrugated pipe" effect. Some encoders do accumulate a few frame buffers that allow them to do a better job in encoding the video and have better motion compensation vector. This does result in "burstiness".
@MushMal Thank you for helping me understand the producer logs better. I had been a little unclear upon how the "Current view duration (ms)" parameter related to fragments. This being said, I don't agree with your analysis for a few reasons. Let me explain my Gstreamer media pipeline a bit. With the command:
gst-launch-1.0 nvcamerasrc fpsRange="30.0 30.0" ! 'video/x-raw(memory:NVMM), width=(int)320, height=(int)240, format=(string)I420, framerate=(fraction)30/1' ! omxh264enc control-rate=1 bitrate=250000 peak-bitrate=500000 iframeinterval=7 profile=1 ! tee name=t t. ! h264parse ! video/x-h264,stream-format=avc,alignment=au,profile=baseline, framerate=30/1 ! kvssink stream-name="MY-STREAM" buffer-duration=5 storage-size=512 avg-bandwidth-bps=31250 access-key="MY-KEY" secret-key="MY-SECRET" aws-region="us-east-1" t. ! h264parse ! omxh264dec ! nveglglessink -e
The first part is the camera:
gst-launch-1.0 nvcamerasrc fpsRange="30.0 30.0" ! 'video/x-raw(memory:NVMM), width=(int)320, height=(int)240, format=(string)I420, framerate=(fraction)30/1'
For testing, I used the camera provided by Nvidia in the TX1 dev kit and set the output parameters to match a camera I am using on my hardware platform. I then use the GPU accelerated encoder, omxh264enc:
omxh264enc control-rate=1 bitrate=250000 peak-bitrate=500000 iframeinterval=7 profile=1
The omxh264enc is very well documented, it is optimized for low-latency, I have worked with it extensively, and I have never heard anyone express concern about this burst behavior. The parameters "bitrate" and "peak-bitrate" set the average and peak bitrate. The "control-rate" parameter allows for the adaptive encoding and "iframeinterval" sets an I-Frame every 7th frame (so for 30 fps, once every ~233 ms here... a bit above the KVS threshold of 5 calls per second, 200 ms). For a fairly rigorous characterization of this h264 encoder, check out this work by Ridgerun
Notably, I then tee the encoded video stream to the kvssink element provided by AWS as well as to a display branch, which decodes the video using GPU accelerated h264 decoder, omxh264dec, and renders to a display attached to the TX1 dev kit. It is worth nothing that I see display playback in under 200ms.
I do not believe the problem is with the h264 encoder I am using. This points to either:
Burstiness in the download from the GPU to the CPU as the encoded buffers are transfered from the encoder to the kvssink element (Though, I think this is highly unlikely, since I previously used a very similar gstreamer pipeline with omxh264enc to stream video over 802.11ac wifi over rtp. There I saw less than 500ms latency. Though, I will check this possibility by replacing the GPU accelerated encoder with a CPU-side encoder. If this is the problem, it should result in uniform, though increased latency).
Something in the interaction between the kvssink element and the encoded buffers
Something in the interaction between the kvssink element and the KVS server
You should be able to recreate my observations using a RaspPI optimized for low-latency, which is a use case detailed extensively in the AWS KVS Producer SDK examples. Is it possible for you to recreate my observations on your side in parallel? Does my logic seem sound?
@jlong29 the "burstiness" might also be an artifact of the GStreamer pipeline thread starvation itself. Here are the few steps I would recommend.
1) Ensure you have GoP sizes no less than 1 second - this is important due to potential throttling from the backend. 2) Use GStreamer queue element in between the encoder and the downstream elements like kvs sink 3) To eliminate the KVS sink, replace kvs sink with autovideosink 4) Use a testvideosrc with kvssink to eliminate the upstream dependency
We have not observed this behavior on the KVS sink side. The only possibility that I am thinking about is the pause/un-pause of the libCurl that we do in order to not spend CPU cycles on busy/idling when there is nothing in the buffer to send. The unpause is controlled by libCurl which can take at times a few hundred millis to complete.
@jlong29 we can confirm that the burst is coming from the producer: This graph is generated by measuring the e2e latency of each frame when running KvsVideoOnlyStreamingSample in c producer repo. We will look into fixing this issue. Let's keep this issue open to track it.
@chehefen Thank you very much for following up with me on this issue. Our observations are consistent. Standing by.
@chehefen @MushMal Any updates upon the status of this issue? I ask because improving this behavior will greatly improve the service we are building upon KVS.
@chehefen @MushMal Any updates on this issue?
We are still evaluating and prioritizing. Could you please clarify your usage scenario?
@MushMal Thanks for the update. We are building a low-latency streaming video service that natively stores all video streams in the Cloud. Our streams are video only, no sound, and are for real-time observation and offline review. Kinesis Video Streams is a good candidate because according to these points on the specs:
KVS seems like a good candidate. We had assumed that the 5 fragments would be distributed uniformly in time, allows for latencies in the area of 250-450 ms. This is very good for our application. Does this answer your question? I'm happy to follow up with more details via email.
TPS for PutMedia doesn't have affect the frame e2e latency.
KVS has WebRTC solution which is specifically designed to handle P2P and low-latency realtime scenario. The KVS itself is low-latency but due to the underlying libCurl EASY interface limitations is not an exactly "real-time" solution. It can be very fast indeed but most real-time solutions would have a direct peer feedback on congestion, etc.. whereas in KVS streams, the producer is entirely isolated from the consumer. Having the Producer library based on curl MULTI interface (or another client that handles pause/unpause properly) will make it very fast but still the producer and the consumer sides will be independent.
The thing that is limiting the e2e latency is that long, > 1000 ms, periodic latency from the producer to KVS. That is a problem with libCurl? We went with KVS instead of WebRTC because KVS stores the data natively and WebRTC would have required more engineering to support this function (streaming duplication, using the TURN server to send data for storage, etc). The current KVS behavior is really close to what we want. If there is an opportunity to shave 300-400 ms off, we would be very happy.
@MushMal I see that @sharadrj has been assigned to this issue. Any updates on the status of this issue?
@jlong29 Unfortunately, as I mentioned in my earlier correspondence, libCurl does not expose any means of resuming the event loop. We need to modify the SDK code to use curl's MULTI interface instead. This is a non-trivial task and will take a while to implement and most importantly, we need to prioritize and schedule this work properly. I really don't have an ETA for now.
I understand your scenario of needing low-latency and persistence. Unfortunately, TURN is designed the way that it can't "crack" the DTLS so persistence will not be possible with TURN.
There are two solutions short term:
1) Have a dual-stream type of mode where the low-latency streaming on-demand is implemented via WebRTC and the persistence is done via KVS streaming. 2) Have a webRTC peer on the cloud that can do kind of an SFU functionality where it can do dual streaming.
@MushMal Thanks for the suggestions. Though, unfortunately, dual streaming is not an option for us at this time. In fact, avoiding dual streaming is the reason we went with KVS in the first place. We will make the best of the longer than expected latency for now. Please keep me posted if there are any updates.
Setting "enhancement" label. The only way to fix this moving forward is to move to MULTI interface for the curl solution. This needs to be properly evaluated.
This "enhancement" would be greatly appreciated for our own products too (qualcomm-based platform, omxh264enc, same behavior).
@rgroleau could I ask you to email kinesis-video-support@amazon.com with your scenario/use case so we can have a better understanding of your needs and be able to follow-up with you directly?
FYI https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/pull/191
Notes there but I decided against CURL MULTI for now, a more trivial change yields same results as MULTI. When we change the threading model in the future to not create a new thread for each one of the callback handlers then we may switch to MULTI.
@jlong29 Can you please try this: https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/pull/204
@hassanctech Thanks for following up on this issue. I'm happy to evaluate this commit, though I will need a few days. I am currently working on an important deadline. I am using the cpp sdk gstreamer plugin, and I would like to test against that, so the quickest test of the commit seems to be the following (lmk if anything more expedient comes to your mind):
Thanks @jlong29 it would be helpful for a few people to verify before I merge the changes.
If you need to use kvssink
then yeah that's probably the quickest way to test.
FYI For my tests I modify the video only streaming sample from Producer C SDK to overwrite the pts/dts with current time immediately before calling putFrame. Then on the receive side I diff the frame pts/dts with current time.
@jlong29 ,
FYI, our top of master today uses CMake dependencies file to modify the commit and does not use open-source anymore. You can modify the commit in the producer C file under Cmake/Dependencies.
@hassanctech Thanks for following up. Perhaps I'll stay with your suggestion to test just the new commit prior to validating interoperability.
@disa6302 Thanks for the heads up regarding the SDK changes
I tried the pull request 204 on my side, on my relatively-slow cpu-wise arm7ahf-neon device with a Qualcomm-based DSP (omx) and gst queues (threads) for the video and audio: The end result is acceptable (the cpu usage is not drastically increased, and the visible live-streaming-scenario burstiness is slighlty reduced from hard-to-notice to almost-unnoticeable on a good connection).
I know this is all very subjective (and specific to our slow-clock IoT DSP-based LTE-cellular-modem device) so I guess this is not a generic comment about the other platforms/devices mentioned in this thread. I guess the 30ms value right now (and the max of 150ms) is highly dependant on various factors (connection speed/latency? and/or FPS? I really don't know, just speculating here). In our case this seems to match quite well with our already-very-bursty cellular-modem connectivity. So it seems good to me (for our very specific platform use-case).
@rejeangroleau Thank you for trying it out! Yes it will be impacted by those factors but right now it's not adaptive but it could become so in the future.
Would you mind sharing your fps range? From what I saw in my tests with ~20fps I don't see any bursty behavior (i.e. curl pause is not invoked). We can make those sleeps or iterations longer to better suit your use case if you expect lower than 8fps for example.
A 30ms sleep is a small price to pay vs 1s added delay due to curl pause, so I'd rather be overly optimistic and unnecessarily sleep a tad bit longer than sleep and have to wake up in a few milliseconds. Ideally we could keep iterating up to 500ms which would mean ~ 17 iterations, not sure if that's necessary at the moment.
@hassanctech I think you are not seeing any pause because you have too much bandwidth available in your test scenario. You could use command line tools such as throttle
(and if you want to limit CPU usage too, you can use cpulimit
). We typically stream anywhere between 15fps and 30fps right now. But we do have a slow device, and a slow connection.
I am experiencing the same issue but I have both powerful machine and good network. For me the rendered frames on parser side gets a periodic delay after every 1-2 sec. I am using http://wiki.ros.org/kinesis_video_streamer to put media on kinesis video stream.
@rejeangroleau @utsavrai Can you please try the latest here: https://github.com/awslabs/amazon-kinesis-video-streams-producer-c/pull/204 I added some exponential back-off here with the latest commit and I think there's enough room in there now to keep checking for data for up ~ 600ms in case of slow network.
@hassanctech It still works well (as in, better than the head of master) on my target platform, although I don't think I am hitting the exponential backup. Looks good to me!
@rejeangroleau I can walk you through adding some debug logging to see if you're entering curl pause (and how often) then perhaps we just need to increase the number of iterations. Let me get this changed pushed into master, hopefully sometime today, and then we can go from there.
The change is in master so I'm closing the issue. @jlong29 @rejeangroleau @utsavrai please feel free to open a new issue if this problem persists.
@hassanctech Thanks for all the solid work on this issue. I'm excited to try it out. Unfortunately, after filing the issue, I've been pretty slammed. Recently, I've been lead engineer on a federal contract that has me very busy. This is all to say that I appreciate your work on this issue, and I hope you don't find my inability to participate at this stage discouraging.
@jlong29 No worries! I just wanted to make sure you were aware that there is an improvement available! Feel free to update whenever you do get a chance to try it out.
@hassanctech Hello from the future! I have been able to test out the improvement, and it is great. Buffers are getting pushed to the cloud much more quickly. I see an average end-to-end latency of ~400-500 ms now. This is great. I need to handle buffer overflow a little more rigorously now, but this is OK.
Describe the bug KVS exhibits an odd delay pattern that results in a lot of latency.
To Reproduce Run the gstreamer plug-in with the following gst-launch-1.0 command (I am using an Nvidia TX1 dev kit with GPU accelerated h264 encoding): gst-launch-1.0 nvcamerasrc fpsRange="30.0 30.0" ! 'video/x-raw(memory:NVMM), width=(int)320, height=(int)240, format=(string)I420, framerate=(fraction)30/1' ! omxh264enc control-rate=1 bitrate=250000 peak-bitrate=500000 iframeinterval=7 profile=1 ! tee name=t t. ! h264parse ! video/x-h264,stream-format=avc,alignment=au,profile=baseline, framerate=30/1 ! kvssink stream-name="qwake-test" buffer-duration=5 storage-size=512 access-key="YOUR_PUBLIC" secret-key="YOUR_SECRET" aws-region="YOUR_REGION" t. ! h264parse ! omxh264dec ! nveglglessink -e
On the client side, I call KVS using the aws cli. I wrote a custom parser to make the sequence of mkv fragments play nice with my video player, mpv. This parser also pulls out the tags: Producer Timestamps, Server Timestamp and Mills From NOW.
Expected behavior I expect to receive each fragment at some average delay that varies as a function of network conditions. Instead I am seeing a single long From Now delay, followed by a burst of very short delays. I'm also seeing a single long Server Timestamp-Producer Timestamp difference followed by a burst of short differences. These long delays are setting the low bound upon latency.
SDK version number 3.0.0
Open source building
Screenshots Here's a spreadsheet I made based upon my observations: Delay-Analysis-15June2020-iframeInt7.pdf
Here's a sketch I made to reflect my understanding of what is going on:
Platform (please complete the following information):