RTMP endpoint not work after failed connection

franeksaww commented 3 years ago

Short description

Brief description of what happened

Environment

Operating system and version: Ubuntu 20.04
Java version: openjdk 11.0.11 2021-04-20
Ant Media Server version: Enterprise Edition 2.3.3.1 20210609_2033
Browser name and version: Google Chrome 92.0.4515.107 (Official Build) (64-bit)

Steps to reproduce

Add RTMP endpoint to existing broadcast pointing correct receiver.
Data is received correctly.
Restart instance or service that is RTMP endpoint target
AMS stop pushing to given RTMP endpoint. To make it work again there is required to remove and add again RTMP endpoint to the broadcast.

Expected behavior

AMS will automatically start pushing to the correct rtmp endpoint

Actual behavior

AMS is not pushing to the rtmp endpoint and to make it work there is required to remove and add again RTMP endpoint

mekya commented 3 years ago

Hi @franeksaww ,

Thank you for the issue. Yes you're right, it should work as you mention.

Just a question, for the step number 3(Restart instance or service that is RTMP endpoint target), you mention about restarting that stream with the same id. Is this OK?

I think this issue may be already resolved in the coming release(2.4.0). We should confirm that.

Regards, A. Oguz

franeksaww commented 3 years ago

Hi @mekya , In point 3 I mean restaring the receiver instance, so e.g. if we are pushing from AMS to instance B and I restart it ( instance B) AMS will not push anymore after the intsance will be online again

mekya commented 3 years ago

Hi @franeksaww ,

Thank you. Yes it makes sense right now.

This is the expected behavior. AMS does not try to push the RTMP endpoint if it fails. I mean Ant Media Server just push the RTMP endpoint and it does not care if the endpoint is AMS, youtube or any other one.

Let me think if it should be as a built-in solution and also let me think the technical feasibility.

Fortunately, I can tell some other technical solutions.

Solution 1:

You can do everything programmatically through REST API

You can monitor the RTMP endpoint status of the Broadcast object.
If the RTMP endpoint fails or error state, you can remove it.
You can try adding RTMP endpoint on the fly according to the status of the instance.

Solution 2:

You can run instance A and instance B in the same cluster and everything will be handled automatically. You don't need to push the stream from instance A to instance B via RTMP. Instances in different locations can be run in the same cluster.

Please let me know if you have any questions and thoughts.

medzin commented 3 years ago

@mekya could you consider adding an optional retry logic with exponential backoff (to protect from flooding external services)? We use AMS to manage streams from surveillance cameras that monitor our systems 24/7 and we often pass the stream on to other cloud services for further analysis. As we stream data continuously throughout the year, temporary unavailability is expected (nobody gives 100% availability :)). Detecting such situations via the API seems to be a bit of a journey around. Such a feature would allow AMS to be more reliably plugged into a larger video processing pipeline.

mekya commented 3 years ago

For instance re-trying to publish for every 5 seconds and stop trying after 3 attempts?

If that does not make sense, let me know your recommendation.

medzin commented 3 years ago

If these 5 seconds will be parameterized and possible to change in the configuration, it's ok for me :)

mekya commented 3 years ago

I see. Thank you :)

medzin commented 3 years ago

@golgetahir @mekya how should I configure these properties for clustered AMS?

settings.endpoint.healthCheckPeriodMs
settings.endpoint.republishLimit

medzin commented 3 years ago

Also, after upgrading to newest AMS (2.4.1-SNAPSHOT 20210930_0752) these settings are by default set to 0 which causes Exception.

/usr/local/antmedia$ grep -R settings.endpoint.republishLimit *
webapps/WebRTCAppEE/WEB-INF/red5-web.properties:settings.endpoint.healthCheckPeriodMs=0
webapps/WebRTCAppEE/WEB-INF/red5-web.properties:settings.endpoint.republishLimit=0

Exception:

2021-10-04 10:32:49,077 [vert.x-worker-thread-18] ERROR io.antmedia.muxer.MuxAdaptor - java.lang.IllegalArgumentException: Cannot schedule a timer with delay < 1 ms
    at io.vertx.core.impl.VertxImpl.scheduleTimeout(VertxImpl.java:494)
    at io.vertx.core.impl.VertxImpl.setPeriodic(VertxImpl.java:332)
    at io.antmedia.muxer.MuxAdaptor.endpointStatusHealthCheck(MuxAdaptor.java:1632)
    at io.antmedia.muxer.MuxAdaptor.endpointStatusUpdated(MuxAdaptor.java:1693)
    at io.antmedia.muxer.RtmpMuxer.setStatus(RtmpMuxer.java:161)
    at io.antmedia.muxer.RtmpMuxer.writeFrameInternal(RtmpMuxer.java:570)
    at io.antmedia.muxer.RtmpMuxer.writePacket(RtmpMuxer.java:491)
    at io.antmedia.muxer.RtmpMuxer.writeVideoBuffer(RtmpMuxer.java:646)
    at io.antmedia.muxer.MuxAdaptor.writeStreamPacket(MuxAdaptor.java:799)
    at io.antmedia.enterprise.adaptive.EncoderAdaptor.writeStreamPacket(EncoderAdaptor.java:1087)
    at io.antmedia.muxer.MuxAdaptor.execute(MuxAdaptor.java:939)
    at io.antmedia.muxer.MuxAdaptor.lambda$start$0(MuxAdaptor.java:1179)
    at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$2(ContextImpl.java:313)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829)

golgetahir commented 3 years ago

Hi @medzin ,

Thank you for your feedback, I am checking the update issue, you can use rest calls to change settings (https://antmedia.io/rest/#/ManagementRestService/changeSettings) in cluster or you can set it in /usr/local/antmedia/webapps/your_app_name/WEB-INF/red5-web.properties and restart the server.

Thank you for the feedback for SNAPSHOT.

Cheers

medzin commented 3 years ago

@golgetahir when I set them in the properties file, they were set to 0 after restarting the service. I had to both change the file and update the database to make it work. It is expected behaviour?

golgetahir commented 3 years ago

@medzin No it is not, since it is a snapshot it may not be very stable like this I am sorry for that. Thanks a lot for your feedback I will stabilize it.

golgetahir commented 2 years ago

Hi all,

It turns out that, if you update the cluster you need to update the database if the appSettings in database is created before the update, it is expected. There are no further issue found with the implementation according to the tests and customer feedbacks.

Cheers

ant-media / Ant-Media-Server