Closed kputyra closed 8 months ago
Thank you @kputyra for the issue. Moving to the backlog
Hi @kputyra, Thank you for the issue. I have been reading and trying to understand the issue. I have a few things to discuss like, You say that there's no such issue when you are using the sample pages to publish and play and the issue occurred when using WebRTC WebSocket Messaging Reference. Is that the case! Also as mentioned there's no reproduce scenario so is this issue repetitive/recurring or did it just happen once!
Thanks
Hi @Mohit-3196
I don't claim the issue cannot arise when using the sample page, only that we're using a customized adapter. The messages are sent in the same order and we tried to follow both the WebSocket messaging reference and they way the adaptor handles the connection. It might be, though, that we have missed some notifications/error messages that can occur seldomly.
The issue happened so far only once and we're using the server for several months already; quite intense since September. It completely desynchronized recordings and we have seen no notification about that. There was no action from the publishing user at that point (or none that we are aware of).
What would definitively help us is to understand in which situation frames are dropped and how to prevent it. As I wrote, I would not expect frames to be dropped when a stream is recorded. If you need more details, I'll be happy to discuss the issue, for instance over zoom.
Best, Kris
Hi @kputyra, Thank you again for writing back. So the stream recording is turned on through REST API. If the publisher iceconnection state is failed and re-connect again, the REST API should be called again if the stream is a zombi stream. Zombi streams are the streams that are not in the database and it's created on the fly. Also we can schedule a call for this Wednesday, December 1st at 18:00 (GMT+3) it and discuss more about it if you are available. Thanks
Hi @kputyra , Looking forward to hear from you. Please let me know your availability and we can proceed accordingly. Thanks
Hi @Mohit-3196
I'm sorry for my late response, I've been overwhelmed with other projects last month. If you have time, we can have a call this Wednesday (I'm available for most of the day) or during the first two weeks of January.
Yes, the recording is turned on through REST API, but the streams are also created using REST API before the publishing starts (we have turned off accepting unknown stream ids). If I understand you correctly, this means that they are not zombi streams. Anyway, the vodReady notification was fired only after publishing has stopped completely, which suggests that the recording was still going on.
Thanks
Hi @kputyra No worries. Thank you for your response. Yes we can schedule a call in January first week. We can do it on Wednesday, 5th January at either 11:00 or 16:00 (GMT+3). So let me know what time suits you.
Thanks
Hi @Mohit-3196 Do I understand correctly that 11:00 for you means 8:00 London time? Then both time slots are fine for me.
Thanks
Hi @kputyra, Sure. Can you please share your mail address... Thanks
Hi @Mohit-3196 Sure, it is visible now in my profile.
Hi @kputyra, I have scheduled a meeting for Wednesday, 5th January. See you there.
Thanks
Thank you @kputyra for the update below. We'll schedule it again in this week to study the logs and scenario
I'm sending all the log lines with the ID of the problematic stream. This was actually a video stream (screen share) not an audio stream.
The setup
We use two streaming apps: SemLive as SFU (webcams, mic, screen) and SemLiveHQ with adaptive bitrate (4k ceiling cameras)
The speaker uses room equipment, both streamed to Ant Media via RTMP:
ceiling camera (ApLPZjENVzpFyRHzTBhvWzhndWkAwwnO @ SemLiveHQ)
ceiling microphone (kxIAzxTbkNOYjcRuWHhPJDkqSmIZMOoj @ SemLive) In addition he shared slides from his laptop via WebRTC. This creates two streams:
the original stream (vlRumxugjFKSbonTPctrhDxQUJexJvzZ)
a preview (160x120, FPS 5) generated in a browser with JS by drawing the video scaled on a canvas and capturing the stream (JRzTFKRNmxnZDfLsjyZqYwkQnyUtGYBA)
Here are direct links to the recordings:
The preview stream is published immediately after the speaker connects to the meeting, whereas the full quality screen stream is published only after requested by a viewer.
What has happened
At 8:01 the speaker initiated the room equipment from a dedicated Pi interface in the room: - ceiling camera - ceiling microphone At 8:03 the speaker connected the meeting from a laptop and shared his screen, which published the screen preview. At 8:04 the screen stream has been requested for the first time, which triggered publishing of the full stream. After 18min the preview stream (JRzTFKRNmxnZDfLsjyZqYwkQnyUtGYBA) was stalled and no longer recorded. The full quality screen stream (vlRumxugjFKSbonTPctrhDxQUJexJvzZ) stalled and was no longer recorded after 38min. All four streams were stopped around 10:01am. Notes
According to the log, the bitrate of preview stream (JRzTFKRNmxnZDfLsjyZqYwkQnyUtGYBA, Publish Stats line) varies between 0 and 2. I'm a bit surprised by such a small number. The video bitrate for client stats is much higher (~4000).
Attachments
Access logs: access-vlRumxugjFKSbonTPctrhDxQUJexJvzZ.log access-JRzTFKRNmxnZDfLsjyZqYwkQnyUtGYBA.log Ant Media server logs: ant-media-server-vlRumxugjFKSbonTPctrhDxQUJexJvzZ.log ant-media-server-JRzTFKRNmxnZDfLsjyZqYwkQnyUtGYBA.log Let me know if you need more data, like more extracts from the log files. The range of timestamps is enough. The error message at 2021-11-09 10:01:04,058
At that time we were deleting a broadcast after receiving liveStreamEnded notification. Currently we stopped doing this, because in some cases this prevented Ant Media to send vodReady:
- everything was fine during tests
- vodReady was not sent for recorded streams with adaptive bitrate when on production server
The only difference is that our production server is on the same machine as Ant Media server, while the development server is on another one. I suppose the small extra latency was enough for vodReady to be triggered in the test environment, but when on the production server, the DELETE request is earlier and AMS no longer sends vodReady. The stream is recorded in both cases,
Hi guys!
This issue assigned to me. I was try to reproduce this issue. It seems that there is a network fluctuation
issue on the publishing side in your first issue. You can see below log:
2021-11-09 08:40:26,691 [vert.x-eventloop-thread-26] INFO i.a.enterprise.webrtc.WebRTCAdaptor - Client:1382213513 for stream vlRumxugjFKSbonTPctrhDxQUJexJvzZ current video bitrate: 8184 audio bitrate: 96000 webrtc client target bitrate: 30000
2021-11-09 08:41:36,691 [vert.x-eventloop-thread-26] INFO i.a.enterprise.webrtc.WebRTCAdaptor - Client:1382213513 for stream vlRumxugjFKSbonTPctrhDxQUJexJvzZ current video bitrate: 660488 audio bitrate: 96000 webrtc client target bitrate: 30000
As I understand this network fluctuation
caused IceConnectionState: FAILED
error. But I couldn't figure out @mekya logs yet. I'm still trying to understand why it's happening.
Hi @kputyra,
I'm investigating this issue in detail. Logs were filtered by stream ID's. We need to check full of Ant Media Server logs. Could you please share full of Ant Media Server logs?
Hi @SelimEmre
Thank you for taking a look on this issue. I'm not sure what you exactly mean by network fluctuation. The affected streams were generated from a screen share of slides - a still image most of the time.
I'm attaching complete logs from 08:00 till 10:04 on that day, it covers the entire lifetime of the stream.
Hi @kputyra,
Thanks for the details. I investigated this issue deeply. I had some ideas about your issue. Let me explain: I saw that your 2 streams(canvas and original) WebSocket communications disconnecting somehow. I'm suspicious about client Power(not enough RAM or CPU). Because I saw that there are a lot of dropping frames logs on the client-side. But there was no low bitrate issue.
What I recommend you:
session_restore
callback for minor WebSocket disconnections. It's supported by the latest version(v2.4.2.1). Please upgrade to the latest version. Auto republish mechanism is used by Default WebRTC Publisher page. As I know you are using a custom page for publishing. Please integrate Auto republish mechanism in your structure. When your streams can disconnect for any reason, your streams will continue with same stream.I hope, it helps you.
Hi @SelimEmre
I don't think it's because of the client machine, it was one of the state-of-the-art laptops. I agree that both streams must have been disconnected as from that time all frames were dropped. What I don't understand is why Ant Media server did not notice that, but kept the stream marked as live. Note that all frames were dropped from the moment of disconnection.
The disconnection could've happened because we have two WiFi routers and operating systems (iOS as well as Windows 10+) sometimes decide to switch from one to another. We have already detected this behavior and when this happens, then all RTC connections are closed, while websocket connections are kept (the IP of the client does not changed). Currently, we listen to the native event connectionstatechange
on an instance of RTCPeerConnection
and try to republish the stream when the state is disconnected
. It seems that this was not quite the case here.
The session_restore
feature looks interesting and I will definitively integrate it to our publisher. I will consult your publisher on how to use it. Thank you for pointing it out!
Hi @kputyra,
I don't think it's because of the client machine, it was one of the state-of-the-art laptops Thanks for the details. Please also consider canvas draws interval. It should
1000/Fps
value. For example:1000/15
.
Please let me know your session_restore
test.
Best Regards, Selim
Closing this issue for the inactivity. Please feel free to re-open if there is still a problem.
Cheers
Short description
A list broadcast is created with a REST API and used to publish via WebRTC a screen (1920x1080) captured by a browser. After the stream is successfully published, recording is turned on using REST API. During the first ~30min viewers come and go; at some moments there is no viewer. Then all viewers are notified of target bitrate 30000, while the stream bitrate varies between ~8000 (still image) and ~1M (new slide). Refetching the stream does not improve the target bitrate. A few minutes later the last viewer stops playing the stream and the server logs
IceConnectionState: FAILED
. From that moment all frames of the stream are dropped and new viewers cannot fetch the stream (the frames are still dropped). All of them receives the same bitrate measurements:After 2h the stream is stopped, but only the first 37min are saved in a recording (till the frames got dropped).
Environment
Expected behavior
The stream should be recorded in whole and the frames should not be dropped. If there is a problem on the publishing side that prevents recording, then either the stream should stop or
vodReady
should be sent to the webhook.Further notes
startDate
that holds the timestamp of the first recorded frame. In a situation like the above we cannot synchronize the recording with other streams, becausecreationDate
is the timestamp when the stream has stopped, which does not match the timestamp of the last recorded frame (because all frames were dropped from some point).Logs
An excerpt from the log file related to the stream (I can send more when needed): https://www.dropbox.com/s/s645wuzgiggoy0a/screen-share-problem.log?dl=0