ant-media / Ant-Media-Server

Ant Media Server is a live streaming engine software that provides adaptive, ultra low latency streaming by using WebRTC technology with ~0.5 seconds latency. Ant Media Server is auto-scalable and it can run on-premise or on-cloud.
https://antmedia.io
Other
4.29k stars 636 forks source link

Wrong installing plugin package in the cluster environment #6777

Open dmtan90 opened 6 days ago

dmtan90 commented 6 days ago

Short description

Brief description of what happened In the cluster environment, the nodes install the old version of the plugin, and therefore the plugin can't work as expected

Environment

Steps to reproduce

  1. Remove the plugin on the master node. For example, my app is CamOS
  2. Install a new version of the plugin (CamOS build 2.11.3-241110_0438)
  3. The main node is updated to the new version 2.11.3-241110_0438. But, the other nodes are installed in the old version (2.11.3-20241110_0301) as below images. Main node version captures_chrome-capture-2024-10-10

Other nodes version captures_chrome-capture-2024-10-10 (1)

Expected behavior

Put as much detail here as possible The other nodes must install the new version as 2.11.3-241110_0438,

Actual behavior

Put as much detail here as possible The other nodes install the old version as 2.11.3-241110_0301,

Suggestion

When we remove any plugin, the AMS nodes should remove the installation package from the temp folder, for example, /tmp/CamOS.war.

Logs

Place logs on pastebin or elsewhere and put links here

Ask your questions on Ant Media Github Discussions

rahul78275 commented 6 days ago

Hello @dmtan90

I am not more aware of the recent update on clustering. But as far as i know, in cluster mode an image of original instance is created, and when auto-scaling happens that image is used to create a new node.

I am assuming that you have created the cluster with the older version of plugin. So updating the plugin on the original node will not work on auto scaled nodes. Image should be updated with the latest version of the plugin.

dmtan90 commented 3 days ago

Hi @rahul78275 I don't know what is the workflow between the origin node and scaled nodes when re-installing the new plugin. But you can try with the default StreamApp.war. The StreamApp.war is still available in the temp folder at /tmp/StreamApp.war on both the origin and scaled nodes and therefore the issue is happening.

rahul78275 commented 2 days ago

Hello @dmtan90

I hope you’re doing well.

Please note that the following is my assumption and should be considered as a suggestion. Someone else may be able to provide a more accurate or thorough response.

Based on my understanding, I do not believe that the StreamApp.war file is the root cause of the issue. StreamApp is the default application that comes with the Ant Media installation. When the cluster was created, it was initially present on the origin node, and it will automatically be recreated on any new nodes during a scale-up event.

If I were in your position, I would approach the issue as follows:

Case 1: On the auto-scaled node (using the node IP), I would log in, delete the older version of CamOS, and then update it to the latest version. After the update, I would verify whether the issue persists and ensure that everything is functioning as expected.

Case 2: In a staging environment, I would create an origin node with the updated version of CamOS, set up the cluster, and then increase the load. I would then check the auto-scaled nodes to verify whether the issue has been resolved.

If the issue persists after these steps, I would recommend raising a support ticket for further assistance.

dmtan90 commented 2 days ago

The StreamApp.war is not the root cause. I mean you can use the StreamApp.war to verify the issue. You can create a version file for example version.txt in the StreamApp.war package. One file is StreamAppOld.war (version 1.0) and one file is StreamAppNew.war (version 2.0). Because we need to check the updated version consistence between the master and scaled nodes. Step by step as below.

  1. First, you will install the older version on the master and scaled nodes.
  2. Next, you will remove the old version on the master and wait for the scaled nodes auto remove it.
  3. Next, you will install the new version on the master node and wait for the scaled nodes auto install it.
  4. Check version file (version.txt) at the app folder directory (for example /usr/local/antmedia/webapps/StreamApp) on the master and scaled nodes . I think the root cause when the scaled nodes pull the new war file from master node. They don't delete or replace the old war file in the temp folder. Therefore the scaled nodes install the old version instead of the new version. No issue if you delete the old war file at temp dir on the master and scaled nodes manually before updating. The suggestion is when the user remove the app from AMS dashboard, the AMS admin app needs to delete the war file in the system temp folder for example /tmp/StreamApp.war
rahul78275 commented 2 days ago

Hello @dmtan90

I am sharing an idea, how auto scales work.

I am not currently familiar with the latest advancements in clustering techniques, anyone from the team can validate me @Mohit-3196 @yashtandon113 .

WhatsApp Image 2024-11-14 at 2 53 23 PM

@dmtan90 you are assuming cluster will pull the latest CamOS plugin from master node to the autoscaled node. But always the replica (image) gets autoscaled not the master node. So any changes made on the master node after creating cluster will not reflect on your autoscaled nodes.

Replica (image) is created during cluster setup. May there could be some technique to create image manually.

Question : How you have created setup ? Using AWS yaml.