ant-media / Ant-Media-Server

Ant Media Server is a live streaming engine software that provides adaptive, ultra low latency streaming by using WebRTC technology with ~0.5 seconds latency. Ant Media Server is auto-scalable and it can run on-premise or on-cloud.
https://antmedia.io
Other
4.29k stars 634 forks source link

Wrong System Memory running in LXC #6626

Closed oe73773 closed 1 week ago

oe73773 commented 2 months ago

Short description

RTMP Stream from OBS is rejected after update Ant Media Server rom 2.6.4 to 2.11.1 running on LXC.

Log in error:

024-09-01 09:45:23,662 [RTMPConnectionExecutor-4] INFO  i.a.AntMediaApplicationAdapter - W3C x-category:session x-event:connect c-ip:172.26.64.56 c-client-id:1
2024-09-01 09:45:23,706 [RTMPConnectionExecutor-4] ERROR i.antmedia.statistic.StatsCollector - Not enough resource. Due to memory limit. Memory usage should be less than %75 but it is %88
2024-09-01 09:45:23,706 [RTMPConnectionExecutor-4] INFO  o.red5.server.net.rtmp.RTMPHandler - There is not enough resource to rtmp ingest stream: obslive
2024-09-01 09:45:23,720 [RTMPConnectionExecutor-1] INFO  o.red5.server.stream.StreamService - deleteStream with internal id:1.0 is null so it's not closed

After doing some debugging. This is cause by Ant Media Server detecting 27.6 GB of 31.0 GB used. While 31 GB is the real memory on the Promox host. The 27.6GB is not a real value (may be use and free are swaped). System shows:

root@ns31xxxxxxxx:~# vmstat -S M
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  2      0  27901     42    668    0    0    58    77  123  170  1  1 96  2  0  

Environment

Steps to reproduce

  1. Update from ANT Media Server 2.6.4
  2. Start Stream on OBS

Expected behavior

Stream should work.

Actual behavior

OBS shows and error

Workaround

Set <property name="memoryLimit" value="${server.memory_limit_percentage:95}" /> in /usr/local/antmedia/conf/red-common.xml

Logs

024-09-01 09:45:23,662 [RTMPConnectionExecutor-4] INFO  i.a.AntMediaApplicationAdapter - W3C x-category:session x-event:connect c-ip:172.26.64.56 c-client-id:1
2024-09-01 09:45:23,706 [RTMPConnectionExecutor-4] ERROR i.antmedia.statistic.StatsCollector - Not enough resource. Due to memory limit. Memory usage should be less than %75 but it is %88
2024-09-01 09:45:23,706 [RTMPConnectionExecutor-4] INFO  o.red5.server.net.rtmp.RTMPHandler - There is not enough resource to rtmp ingest stream: obslive
2024-09-01 09:45:23,720 [RTMPConnectionExecutor-1] INFO  o.red5.server.stream.StreamService - deleteStream with internal id:1.0 is null so it's not closed
muratugureminoglu commented 2 months ago

Hi @oe73773

Thanks for reporting this issue. Let me reproduce it on my end. Then I will update this thread.

Thank you.

muratugureminoglu commented 2 months ago

Hi @oe73773

I 've just confirmed this on my side and as you mentioned the container only shows the memory of the host. I will raise this in the next technical meeting. Thank you again for reporting this.

image

Regards.

lastpeony commented 1 month ago

Hello @oe73773 @muratugureminoglu I investigated this issue and found a solution. There is a minor change required on server side code to make it work.

First, you need to set CGroup memory limit for the LXC container. You can do it by adding below lines to container config file. Open /var/lib/lxc/{container_name/config file sudo nano /var/lib/lxc/mycontainer/config For example to limit container memory usage to 5 gb add below lines:

lxc.cgroup.memory.limit_in_bytes = 5147483648
lxc.cgroup2.memory.max = 5G

image

Save and quit.

Stop the container sudo lxc-stop -n mycontainer Start the container sudo lxc-start-n mycontainer

Now, when you attach the container and restart ant media server, you should observe that total system memory is displayed as 5 gb on web panel. The memory usage might be displayed wrong. (it was negative on my end) This is due to Pointer.availablePhysicalBytes(); in SystemUtils class. https://github.com/ant-media/Ant-Media-Server/blob/2ea3b1fd0e99cc535605687b8da891f1438279b4/src/main/java/io/antmedia/SystemUtils.java#L276 not getting availableMemory correct in LXC.

I fixed this problem by changing it to osTotalPhysicalMemory() -osInUsePhysicalMemory();

which works fine on my end and sent a PR. https://github.com/ant-media/Ant-Media-Server/pull/6655

Is there a spesific reason to use Pointer.availablePhysicalBytes(); instead of osTotalPhysicalMemory() -osInUsePhysicalMemory(); to get available memory? @mekya

After the above fix now memory is displayed correct in LXC container: image

Also you shouldnt have problem publishing streams.

lastpeony commented 1 month ago

I delved deeper into this issue and concluded that for some reason, Pointer.availablePhysicalBytes() doesn't adhere to the cgroup rule. I set a memory limit of 2 GB for testing, but it returned 9.2 GB, which I couldn’t interpret. I tested this behavior on AMS and with a small Java program I wrote, both using the same ByteDeco version.

I asked in here: https://github.com/bytedeco/javacpp/discussions/780

lastpeony commented 1 month ago

The maintainer of bytedeco.javacpp has indicated that accurately obtaining memory information through Pointer.availablePhysicalBytes() is a challenging task that requires further development. For more details, you can refer to the discussion here: GitHub Discussion.

In the meantime, AMS can determine if it’s operating within a container using Java methods, though these methods may not be completely reliable and could change with future versions of LXC or Docker. If AMS detects that it’s running in a container, it will use the formula osTotalPhysicalMemory() - osInUsePhysicalMemory() instead of Pointer.availablePhysicalBytes(). This method has its own drawbacks, which @mekya can elaborate on further.

lastpeony commented 2 weeks ago

We had a discussion with @mekya and decided to change memory calculation if env is container. With mentioned PR ams detects if its running inside a container enviroment and if its container it calculates mem available through cgroup memory files. On my tests i observed this works fine. Keep in mind that when running in LXC container you should not run AMS as a systemd service, otherwise memory calculation might not work correctly.