awslabs / amazon-kinesis-video-streams-producer-c

https://awslabs.github.io/amazon-kinesis-video-streams-producer-c/group__PublicMemberFunctions.html
Apache License 2.0
54 stars 72 forks source link

[Question] putKinesisVideoFrame API blocking the thread infinitely #179

Closed sairoopsomaraju closed 3 years ago

sairoopsomaraju commented 3 years ago

Media pipeline:

Logging Attached KVS logs below along with the app's write log.

kinesis.infinite.fragment.error.logs.txt

Describe the bug While writing at high frame rate w.r.t the KVS storage size the putFrame API blocks indefinitely.

SDK version number Latest commit in master branch ( cd05802945f62a8546c962cc59920e96c34663b5 )

To Reproduce Steps to reproduce the behavior:

  1. Open KVS client instance with 1.2MB storage space and 5 KVS streams
  2. Simultaneously write frames from all the 5 streams at a rate of 135kBps each, While writing big I frames we will be getting few dropped frame report callbacks.
  3. Later while writing one such big I-frame of ~333KB from one stream the SDK invokes 12 such dropped frame report callback and then starts logging "ContentView is not big enough to contain a single fragment" infinitely never returning from the putKinesisVideoFrame() API blocking the thread.

Expected behavior Expecting that despite the congestions it may experience it should error out via API failure or callback instead of looping and blocking the APP thread.

Desktop (please complete the following information):

Additional context When I increase the client storage size to a bigger value I'm not getting this error or in some never high frame rate instances, I'm getting this error even for 1 KVS stream handle. Am I not handling the drop frame or other such stream callbacks correctly?

MushMal commented 3 years ago

It's entirely unclear what your scenario is why you claim the thread is blocked. From the logs I can see you are creating a few streams in realtime mode. Soon after you start streaming you put buffer pressures (what seems on the physical store size) and you start dropping the tail frames. The logs are not verbose so I can't tell what's going on and whether you are using continuous retry policy which would re-create the connection in hopes to speed the process. The observed frame rate at times is 460 fps. Are you loading the frames from a disk or something?

You might want to debug your scenario further. Try to get some insight on whether your thread is blocked or not, which thread, what's its blocked on.

Try to provide a lot more details on your scenario, settings and assets you are using. Include verbose logs. As you are using STREAMING_TYPE_REALTIME mode, the SDK will not be blocking on anything.

Removing "bug" tag.

sairoopsomaraju commented 3 years ago

Hi,

It's entirely unclear what your scenario is why you claim the thread is blocked. From the logs I can see you are creating a few streams in realtime mode. Soon after you start streaming you put buffer pressures (what seems on the physical store size) and you start dropping the tail frames. The logs are not verbose so I can't tell what's going on and whether you are using continuous retry policy which would re-create the connection in hopes to speed the process. The observed frame rate at times is 460 fps. Are you loading the frames from a disk or something?

Can you give a bit more information about where I should be using the continuous retry policy actually? Let me explain my scenario, Remote cameras forward the frames via our channels to one of our nodes (where we run KVS) running in the cloud for on-demand playback. From there we open one stream per camera and later when on-demand we use REST API to get DASH URL for the videos uploaded. The logs that I provided get the frames from the camera simulator which is reading frames from disk to the KVS node running at the cloud which is also exhibiting the same behavior.

You might want to debug your scenario further. Try to get some insight on whether your thread is blocked or not, which thread, what's blocked on.

The thread which is calling the putKinesisVideoFrame() is not returning indefinitely at some point while writing a frame.

Try to provide a lot more details on your scenario, settings and assets you are using. Include verbose logs. As you are using STREAMING_TYPE_REALTIME mode, the SDK will not be blocking on anything.

The setting that I've used for creating client and streams are the default values that SDK functions provide. For creating the client:

  1. Called createDefaultCallbacksProviderWithAwsCredentials() with my credentials.
  2. Called createStreamCallbacks() an obtained stream handle and added it to the client callbacks handle.
  3. Created client handle using createKinesisVideoClient() with storageSize as 1200000 bytes For creating the stream handle:
  4. Called createRealtimeVideoStreamInfoProvider() with 120 secs as buffer duration and 7 days retention period.
  5. Called createKinesisVideoStreamSync() with the stream info and client handle.

Did I configured anything wrong or need to configure for my situation? Below I included verbose logs: kinesis.verbose.logs.txt

Thanks.

MushMal commented 3 years ago

The logs don't have any indication that a thread is blocked. As I mentioned, try to debug this as we can't.

If you are loading and submitting frames to KVS from a fast source (say a disk) then you might want to use createOfflineVideoStreamInfoProvider which would set the OFFLINE streaming mode - in this mode the thread that produces frames WILL block for the availability of the storage/temporal duration.

I would strongly recommend running and debugging your application with a single stream. Run the provided samples and make sure they work fine which would eliminate any network/auth issues.

If the thread truly is blocked then attach a debugger, pause and dump the backtraces of all of the threads which would give you some idea what's its blocked on.

NOTE: There are SYNC APIs such as stopStreamSync. The sync part will block the execution thread until the data in the buffer is submitted correctly (with retries on failures) and the last fragment ACK is received. The default timeout (STREAM_CLOSED_TIMEOUT_DURATION_IN_SECONDS) is 120 seconds so if you have a network issue or something similar, the stopStreamSync will block for 120 seconds.

sairoopsomaraju commented 3 years ago

Hi @MushMal, I'm still looking into the issue. Meanwhile can I some questions answered regarding KVS,

  1. Is it okay to open both the client and the stream for each upload I do and close both of them once I'm done with the uploading? Are there any performance overheads?
  2. What is the correct usage of EoFr frame? (I'm presuming that it should be called at the end of each upload session after the last frame)
  3. If got any of the stream error callbacks like (Stream error callback, Stream staleness callback, Stream latency pressure callback, etc.,) and still attempted to write/put a frame does the API return error?
  4. This documentation https://docs.aws.amazon.com/kinesisvideostreams/latest/dg/producer-reference-callbacks.html says that it's advisable to close/reset the connection on receiving these callbacks - Stream error callback, Stream staleness callback and Stream latency pressure callback. What should be the correct usage of Frame drop handling callback? Is there a way to dynamically increase the storage size of the client or is it okay to close the connection?
MushMal commented 3 years ago

1) You can do that but there is performance overhead indeed. Each time you need to re-create a stream, there is a set of states that the state machine needs to go through, including calling APIs. The better approach would be to let automatic intermittent producer case. For example, let the frames to go in when they are available. The latest commits of the repositories have automatic intermittent producer policy implemented. 2) With automatic intermittent producer you don't need to deal with EoFR. This is a really advanced usage and we discourage customers using EoFR directly. 3) Depends on an error and what causes it. For example, if you are using the ASYNC API and you get errors while the state machine tries to get the stream to the READY state then pushing frames to the SDK will result in an error. Our samples are using SYNC APIs so it's easier for the customers applications to get started instead of working with the ASYNC API. Staleness and pressures are not errors. Those are conditions. Check the documentation for more info: https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp/blob/master/docs/buffering.md 4) This heavily depends on your application usage. Some applications will simply do nothing. It's like fire and forget and if the frames get dropped on the pressure so be it. We have a continuous retry callback provider which would attempt to do "everything possible" to keep streaming but for some applications which are more sensitive, they could chose to do other things - for example raise an alarm or lower the framerate at the source..

Please resolve this issue as we are diverging from the original question

disa6302 commented 3 years ago

@sairoop-elear ,

Closing assuming question answered. Feel free to open a new issue if you have any more questions.