RAM leak to the MeshCentral server

sheshko-as commented 2 weeks ago

Describe the bug During operation, there is a sharp consumption of RAM until it runs out. I increased the amount of memory: it was 4 , then 8, now 16. Increasing the amount of memory does not help. I checked on a dedicated server: when the memory runs out, the service is running, but the memory is at the limit. I checked on the VPS server, the service restarts when memory runs out. The problem may occur once a day, perhaps once every three days, but no patterns have been found.

Information from journalctl:

Jun 14 18:56:41 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/meshcentral.service,task=node,pid=1> Jun 14 18:56:41 kernel: Out of memory: Killed process 15892 (node) total-vm:28494740kB, anon-rss:15090984kB, file-rss:1304kB, shmem-rss:0kB, UID:0 pgtables:49396kB oom_sco> Jun 14 18:56:41 systemd[1]: meshcentral.service: A process of this unit has been killed by the OOM killer. Jun 14 18:56:42 systemd[1]: meshcentral.service: Failed with result 'oom-kill'. Jun 14 18:56:42 systemd[1]: meshcentral.service: Consumed 4h 57min 33.871s CPU time. Jun 14 18:56:52 systemd[1]: meshcentral.service: Scheduled restart job, restart counter is at 1. Jun 14 18:56:52 systemd[1]: Stopped MeshCentral Server. Jun 14 18:56:52 systemd[1]: meshcentral.service: Consumed 4h 57min 33.871s CPU time. Jun 14 18:56:52 systemd[1]: Started MeshCentral Server.

Server Software (please complete the following information):

OS: Ubuntu 22.04
Virtualization: VPS
RAM: 16Gb
vCPU: 4 core 2,2 Ghz
SSD: 20 Gb
Network: WAN
Version: 1.1.24
Node: 18.20.3
- MongoDB: "version": "7.0.11", "gitVersion": "f451220f0df2b9dfe073f1521837f8ec5c208a8c", "openSSLVersion": "OpenSSL 3.0.2 15 Mar 2022",

Client Device (please complete the following information):

Device: PC
OS: Windows 10
Network: Remote over WAN
Browser: Google Chrome, Yandex, Opera GX
MeshCentralRouter Version: 1.8.8795

Remote Device (please complete the following information):

Device: PC
Quantity: 1500 devices
OS: Windows 10 Pro - 22H2
Network: Remote over WAN
Current Core Version (if known): Dec 9 2022, 3840084365

Your config.json file

{
  "settings": {
    "cert": "XXXXXX",
    "MongoDb": "mongodb://127.0.0.1:27017/meshcentral",
    "WANonly": true,
    "autoBackup": {
      "backupIntervalHours": 24,
      "keepLastDaysBackup": 30,
      "zipPassword": "XXXXXX",
      "webdav": {
        "url": "XXXXXX",
        "username": "XXXXXX",
        "password": "XXXXXX",
        "folderName": "XXXXXX",
        "maxFiles": 30
      }
    }
  },
  "domains": {
    "": {
      "title": "XXXXXX",
      "title2": "XXXXXX",
      "hide": 5
    }
  },
  "letsencrypt": {
    "email": "XXXXXX@XXXXXX",
    "names": "XXXXXX",
    "production": true
  }
}

si458 commented 2 weeks ago

something happened between 11 and 12 from the looks of the graph

going to sound like a DAFT one, can you disable/remove the autoBackup and restart and monitor?

and the fact it looks like its loading itself over and over again in the pic doesnt look good either?

sheshko-as commented 2 weeks ago

going to sound like a DAFT one, can you disable/remove the autoBackup and restart and monitor?

I disabled autobackup, rebooted the server, and watched the server work.

sheshko-as commented 2 weeks ago

Sometimes this error appears in the logs:

-------- 6/16/2024, 9:33:59 PM ---- 1.1.24 --------

(node:55552) Warning: An error event has already been emitted on the socket. Please use the destroy method on the socket while handling a 'clientError' event. (Use node --trace-warnings ... to show where the warning was created)

but I don't think it's related to the problem

si458 commented 2 weeks ago

@sheshko-as that issue has been around for about a year, it first popped up when we had to move to node 14 and upgraded expressjs havent been able to track down what line is causing it yet but i dont think its effecting you UNLESS the timestamp of the event is WHEN you notice memory being increased?

sheshko-as commented 1 week ago

going to sound like a DAFT one, can you disable/remove the autoBackup and restart and monitor?

It didn't help

sheshko-as commented 1 week ago

Server Error Log: -------- 6/19/2024, 9:02:28 PM ---- 1.1.24 --------

<--- Last few GCs --->

[89911:0x5ff48b0] 17700692 ms: Mark-sweep 4047.0 (4138.1) -> 4034.2 (4141.1) MB, 2816.1 / 0.0 ms (average mu = 0.346, current mu = 0.030) allocation failure; scavenge might not succeed [89911:0x5ff48b0] 17705461 ms: Mark-sweep 4050.1 (4141.1) -> 4037.5 (4144.4) MB, 4691.7 / 0.0 ms (average mu = 0.167, current mu = 0.016) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory