Closed coldheartday closed 3 years ago
In the offline mode the putFrame API will block for some time (long timeout) until there is availability. The locks are re-entrant so it's OK for a thread to re-acquire the lock.
In this case, I believe the right approach is to disallow auto-intermittent producer in case of an offline mode. @hassanctech this is something you can look at?
@coldheartday, you should disable auto-intermittent producer in your application manually until this is fixed.
@MushMal Sorry, what does auto-intermittent producer means ? In my case, there is some errors occured and the data buffer is full. The putFrame API should return after 15s timeout(default timeout) , but it never timeout when "Thread 2" happened. It's ok for thread 2 to re-acquire the "B" lock, but it will never ulock "B" due to blocked by "A" lock while "A" lock will never unlocked by thread 1.
I tried to removed stream lock in "checkIntermittentProducerCallback" , then no deadlock, The putFrame API always return after 15s timeout.
Auto-intermittent producer is a capability of the producer SDK to emit a signal for the backend to persist the accumulated fragment in the absence of any frame bits available to be sent. This is needed in order to ensure the accumulated frame is persisted before the timeout would disrupt the connection and the accumulated buffer will be dumped.
In your case, I am not yet entirely sure what is going on. The auto-intermittent producer should only attempt to produce the special EoFR frame in case the buffer is empty.
I would let @hassanctech to comment on it.
Thanks for that explanation @MushMal.
We will only ever proceed to the wait condition in handleAvailability
if the allocation handle is invalid which would mean the buffer is full.
Yes I see the issue thank you for reporting this @coldheartday.
There are a few ways I see to resolve this. We could disable auto intermittent producer for the offline case however if your network is bad (which seems to be the case right now since your buffer is full AND there hasn't been any successful putFrame call in > 20s (which is what triggers the checkIntermittentProducerCallback
).
Another solution is to just add the frame order coordinator lock right outside here:
https://github.com/awslabs/amazon-kinesis-video-streams-pic/blob/master/src/client/src/Client.c#L59
This would preserve the lock order to match when other threads call the putKinesisVideoFrame
API which also first acquires the frame order coordinator lock and THEN acquires the stream lock.
If frameOrderingMode == FRAME_ORDER_MODE_PASS_THROUGH
we actually do NOT acquire the frame order coordinator lock first when the putKinesisVideoFrame API is called so in this case when we could hit handleAvailability
and eventually the the wait condition and release the streams lock. Then checkIntermittentProducerCallback
would be free to grab the streams lock it would also end up in handleAvailability
and enter the wait condition, eventually the first thread would be signaled or time out so it would re-grab the lock (which is available because both threads took turns releasing the lock while in the wait condition) and it'll fail out, and then it will continue this way nothing will be blocked. Adding the frame order coordinator lock inside of the checkIntermittentProducerCallback
does not change any of this behavior because the media thread calling putKinesisVideoFrame in the pass through mode will never care about the frame order coordinator lock. In any case we could always add an explicit check in the callback for the frame order mode and only acquire the frame order lock if the mode is not pass through.
I will work on these changes and provide an update when I have a PR. I'm not sure how much value this will provide since in the offline use case this the auto intermittent producer callback can only be triggered if no frames are actually being sent over the network and that has 2 sub cases: a. buffer is not full, in which case there's no deadlock but we will be able to properly persist the fragment in the back-end before it closes the connection, b. buffer is full, in which case we currently end up in deadlock but with the proposed change we will no longer be in deadlock, but if buffer is full and network is bad, the auto intermittent producer signal will may also not go out so client side errors will be reported due to incomplete fragment ingested and we have to remove from the buffer because we ran out of memory in the buffer to store it.
@coldheartday the fix has been merged, can you please try it?
@hassanctech Thanks, it's fixed.
I use kinesis-video-streams-producer c++ sdk to upload clips. I find putFrameMultiTrack() will blocked and never return in some case. I checked and find a deadlock.
Besides, I find checkIntermittentProducerCallback will lock stream before call putKinesisVideoFrame(), but putframe() called by putKinesisVideoFrame() also lock stream.