ant-media / Ant-Media-Server

Ant Media Server is a live streaming engine software that provides adaptive, ultra low latency streaming by using WebRTC technology with ~0.5 seconds latency. Ant Media Server is auto-scalable and it can run on-premise or on-cloud.
https://antmedia.io
Other
4.22k stars 618 forks source link

Multiple streams stopped generating new .ts files on the server #6486

Closed amarantmeida closed 6 days ago

amarantmeida commented 1 month ago

Short description

Observed that several streams stopped generating new .ts files on the server. The .m3u8 files have remained unchanged for over 16 hours.

In the antmedia-error.log file, we found the following error repeated approximately 62,137,440 times, causing the log file to grow to around 8GB: Error: [hls @ 0x7f4470079f40] Application provided invalid, non monotonically increasing dts to muxer in stream 1: 193273527510 >= 12606721920

Temp fix- Restarting the stream.

Environment

Steps to reproduce

  1. NA

Expected behavior

Stream should not stop generating .ts files

Actual behavior

several streams stopped generating new .ts files on the server. The .m3u8 files have remained unchanged for over 16 hours.

Logs

(https://drive.google.com/drive/folders/1TbL80vblMyNZXy02MCD1cgdT6Lzpzk5R)

Ask your questions on Ant Media Github Discussions

oleul05 commented 1 month ago

@amarantmeida, @burak-58 We have 10 camera streams running on the same server. We noticed that when the ts file number for any camera stream reaches 1073740, it stops generating new ts files. Other camera stream which didn't reach 1073740 it wasn't stopped generating new ts files. Ant Media Server version: Enterprise Edition 2.8.2 20240201_1142 Running on Vultr kubernetes Engine with clustering. Database: MongoDB Server configuration: 8vCPU and 16GB of System Memory Please let me know if you need any more information. image

mekya commented 1 month ago

Hi @oleul05,

Thank you for the insight. It helped a lot to get basic understanding. My point is that it's likely overflowing the timestamps.

1073740 * 2000(assumption of key frame interval in milliseconds) = 2147480000. It's close to half of integer limit(4294967296/2 = 2147483648). It overflows after 2147483648 because it's a signed value.

We can check if this problem happens due to the Ant Media Server.

Cheers Oguz

oleul05 commented 1 month ago

@mekya Good afternoon. Thank you for your investigation. Please confirm us whether it is a problem inside Ant Media server. Is so please let us know the exact root cause and possible fix.

lastpeony commented 1 month ago

@oleul05 Is this happening with all cameras or some particular cameras?

oleul05 commented 1 month ago

@lastpeony We had 10 cameras running on one server. The issue occurred only when a camera's .ts file count exceeded 1,073,740. In our case, this happened with 6 cameras, which had been running continuously for around 26 days. The other 4 cameras were restarted during this period.

lastpeony commented 1 month ago

@lastpeony We had 10 cameras running on one server. The issue occurred only when a camera's .ts file count exceeded 1,073,740. In our case, this happened with 6 cameras, which had been running continuously for around 26 days. The other 4 cameras were restarted during this period.

Are they all the same camera models? I checked the server and found that we do have some protection for overflow cases. I suspect that the camera might be sending faulty values, which we have observed in some other instances.

oleul05 commented 1 month ago

@lastpeony Yes, all the encoder are the same model.

oleul05 commented 1 month ago

@burak-58, @lastpeony Are there any updates for this bug? @burak-58, were you able toschedule a meeting which you mentioned yesterday on AMS tech talk meeting?

lastpeony commented 1 month ago

@oleul05 Hi ole i sent you a test server ip through e mails on wednesday for us to debug this issue further. Could you please check your e mails/tickets?

oleul05 commented 1 month ago

@lastpeony The issue was occurred in production environment and client was streaming from encoder. So is it really difficult for us to stream from those encoders. Could you please investigate from the log files or try to reproduce the same error from your side? We will also try to reproduce this same error from our side with some other camera/encoder. If we succeed to reproduce the issue in our dev or QA environment we will let you know. I already replied to that email.

oleul05 commented 1 month ago

@lastpeony, @burak-58 We have observed the same error in the QA environment: Application provided invalid, non-monotonically increasing DTS to muxer in stream 1: 4374767250 >= 4374757710. However, after disabling the camera audio, this error no longer appears.