awslabs / amazon-kinesis-video-streams-producer-sdk-java

Allows developers to install and customize their connected camera and other devices to securely stream video, audio, and time-encoded data to Kinesis Video Streams
Apache License 2.0
78 stars 75 forks source link

Fragments not sent after when connection stale #128

Closed abdulsiddiqi closed 4 years ago

abdulsiddiqi commented 4 years ago

We are running into situations where the connection is stale for 30 seconds or so, but after we resetConnection and it recovers the KVS client doesn't publish the frames which were supposed to be sent during those 30 seconds of activity. There are no frame drops either. Instead there is a gap in data, about 15 fragments as expected with 2 sec fragment. We are publishing using REALTIME streaming configuration.

Does the KVS client guarantee to send all the data when using NON_REALTIME or OFFLINE mode?

MushMal commented 4 years ago

This is an interesting case.

Basically, what you are doing in Java is you are resetting the connection on the stream staleness callback and you end up with some data that's missing.

This could happen indeed. What happens is that the PIC (the core business logic in native code) streams out bits OK but apparently, due to your network topology, the TCP packets are not being delivered to the Inlet and thus, no ACKs are generated. After a certain threshold, it calls the stream stale callback.

When the reset comes along, it has to decide where to restart streaming from on the new session. The decision is made in part by the macro defined here: https://github.com/awslabs/amazon-kinesis-video-streams-pic/blob/master/src/client/src/Include_i.h#L152

What this evaluates to is that the stream will be rolled back to the last ACK or the rollback duration - whichever is the latest.

Can you please check what your rollback duration is in the StreamInfo structure? If it's more than 30 seconds (which is what you are seeing is missing) then I can't really be sure what's going on as it would seem that you did receive ACKs in that period.

How prevalent is this issue in your scenario? Can you gather the debug logs when this happens so we can see if there is something else that's going on?

If we really did get the ACKs (those are buffering ACKs) then we will rollback to that ACK time assuming the actual HOST is still alive. Can you also describe how you are consuming the stream? If you are doing GetMedia? Can you perform ListFragments and see whether they are actually missing (run it after a while for the fragments in the memory buffer to be able to index and persist) - this should take a few seconds.

abdulsiddiqi commented 4 years ago

Seems like I solved my issue. The camera streams that we listen to and replay data for wasn't providing us with the data for 30 seconds, which caused the connection staleness I'm guessing, and hence the gap in the data.

I'm sorry for making you write such a long answer, I wasn't aware connection staleness can happen if I don't send any data or else I would have checked missing incoming data first.