Ylianst / MeshCentral

A complete web-based remote monitoring and management web site. Once setup you can install agents and perform remote desktop session to devices on the local network or over the Internet.
https://meshcentral.com
Apache License 2.0
3.68k stars 511 forks source link

Error in the MeshServer operation: meshuser.js:792 #6127

Closed sheshko-as closed 3 weeks ago

sheshko-as commented 1 month ago

Describe the bug The problem occurs randomly, it may occur once a day, maybe once every three days. Due to an error, the MeshCentral service is being restarted, journalctl:

node[722]: ERR: /root/meshcentral/node_modules/meshcentral/meshuser.js:792
node[722]: if ((docs[i].rdp != null) && (docs[i].rdp[obj.user._id] != null)) { docs[i].rdp = 1; } else { delete docs[i].rdp; }
node[722]: ^
node[722]: TypeError: Cannot read properties of undefined (reading '_id')
node[722]: at /root/meshcentral/node_modules/meshcentral/meshuser.js:792:80
node[722]: at /root/meshcentral/node_modules/meshcentral/db.js:2571:54
node[722]: at /root/meshcentral/node_modules/mongodb/lib/utils.js:349:28
node[722]: at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
node[722]: Node.js v18.20.2
node[722]: Error: Command failed: /usr/bin/node /root/meshcentral/node_modules/meshcentral --launch 722
node[722]: (node:2384) Warning: An error event has already been emitted on the socket. Please use the destroy method on the socket while handling a 'clientError' event.
node[722]: (Use node --trace-warnings ... to show where the warning was created)
node[722]: /root/meshcentral/node_modules/meshcentral/meshuser.js:792
node[722]: if ((docs[i].rdp != null) && (docs[i].rdp[obj.user._id] != null)) { docs[i].rdp = 1; } else { delete docs[i].rdp; }
node[722]: ^
node[722]: TypeError: Cannot read properties of undefined (reading '_id')
node[722]: at /root/meshcentral/node_modules/meshcentral/meshuser.js:792:80
node[722]: at /root/meshcentral/node_modules/meshcentral/db.js:2571:54
node[722]: at /root/meshcentral/node_modules/mongodb/lib/utils.js:349:28
node[722]: at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
node[722]: Node.js v18.20.2
node[722]: at ChildProcess.exithandler (node:child_process:422:12)
node[722]: at ChildProcess.emit (node:events:529:35)
node[722]: at maybeClose (node:internal/child_process:1098:16)
node[722]: at ChildProcess._handle.onexit (node:internal/child_process:303:5) {
node[722]: code: 1,
node[722]: killed: false,
node[722]: signal: null,
node[722]: cmd: '/usr/bin/node /root/meshcentral/node_modules/meshcentral --launch 722'
node[722]: }
node[722]: ERROR: MeshCentral failed with critical error, check mesherrors.txt. Restarting in 5 seconds...
node[722]: MeshCentral HTTP redirection server running on port 80.
node[722]: MeshCentral v1.1.23, WAN mode, Production mode.
node[722]: MeshCentral Intel(R) AMT server running on DOMAIN:4433.
node[722]: MeshCentral HTTPS server running on DOMAIN:443.

Server Software (please complete the following information):

Client Device (please complete the following information):

Remote Device (please complete the following information):

Your config.json file

{
  "settings": {
    "cert": "XXXXXX",
    "MongoDb": "mongodb://127.0.0.1:27017/meshcentral",
    "WANonly": true,
    "autoBackup": {
      "backupIntervalHours": 24,
      "keepLastDaysBackup": 30,
      "zipPassword": "XXXXXX",
      "webdav": {
        "url": "XXXXXX",
        "username": "XXXXXX",
        "password": "XXXXXX",
        "folderName": "XXXXXX",
        "maxFiles": 30
      }
    }
  },
  "domains": {
    "": {
      "title": "XXXXXX",
      "title2": "XXXXXX",
      "hide": 5
    }
  },
  "letsencrypt": {
    "email": "XXXXXX@XXXXXX",
    "names": "XXXXXX",
    "production": true
  }
}
si458 commented 1 month ago

hmm very strange its crashing when u ask meshcentral for a list of nodes, and a device has rdp credentials BUT its crashing because the user doesnt have a id assigned to it? can u replicate the issue? like a certain admin logging in with certain browsers or using meshctrl?

sheshko-as commented 1 month ago

hmm very strange its crashing when u ask meshcentral for a list of nodes, and a device has rdp credentials BUT its crashing because the user doesnt have a id assigned to it? can u replicate the issue? like a certain admin logging in with certain browsers or using meshctrl?

I will try to reproduce the problem and write to you in detail

sheshko-as commented 1 month ago

hmm very strange its crashing when u ask meshcentral for a list of nodes, and a device has rdp credentials BUT its crashing because the user doesnt have a id assigned to it? can u replicate the issue? like a certain admin logging in with certain browsers or using meshctrl?

I can't reproduce the problem in any way, there is no pattern. It can work two or three times without errors, the last two days, every day an error, restarting the MeshCentral service: -------- 6/4/2024, 2:46:08 PM ---- 1.1.24 --------

/root/meshcentral/node_modules/meshcentral/meshuser.js:792 if ((docs[i].rdp != null) && (docs[i].rdp[obj.user._id] != null)) { docs[i].rdp = 1; } else { delete docs[i].rdp; } ^

TypeError: Cannot read properties of undefined (reading '_id') at /root/meshcentral/node_modules/meshcentral/meshuser.js:792:80 at /root/meshcentral/node_modules/meshcentral/db.js:2571:54 at /root/meshcentral/node_modules/mongodb/lib/utils.js:349:28 at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Node.js v18.20.2

I tried to change the VPS, restore from backup on a completely new installation, increase the power of the VPS

si458 commented 1 month ago

if you are comfortable, change obj.user._id to user._id, restart meshcentral, then keep an eye out and see if it crashes again

the commit for this line hasnt changed in 2 years - https://github.com/Ylianst/MeshCentral/commit/753b6c240a4050449267e6b10ca7d0692eb6e257#diff-534d32533813bb72f2e9474b37160f11b5a292ff14ba57f1483044c092bd5b57R751

BUT when checking the commit the is lots of user._id and only that the rdp and ssh lines showing obj.user._id

and after checking it here obj.user seems to match user, so it might be ur setup isnt setting obj.user somewhere?

sheshko-as commented 1 month ago

if you are comfortable, change obj.user._id to user._id, restart meshcentral, then keep an eye out and see if it crashes again

the commit for this line hasnt changed in 2 years - 753b6c2#diff-534d32533813bb72f2e9474b37160f11b5a292ff14ba57f1483044c092bd5b57R751

BUT when checking the commit the is lots of user._id and only that the rdp and ssh lines showing obj.user._id

and after checking it here obj.user seems to match user, so it might be ur setup isnt setting obj.user somewhere?

Okay, I'll check it today and write the result. It may be useful for information: MeshCentral admins use RDP, there can be up to 50 RDP connections at the same time.

sheshko-as commented 4 weeks ago

change obj.user._id to user._id, restart meshcentral, then keep an eye out and see if it crashes again

I think it helped, there are no problems for three days, I will observe for a few more days and I will write back according to the result.

sheshko-as commented 3 weeks ago

change obj.user._id to user._id, restart meshcentral, then keep an eye out and see if it crashes again

It helped

sheshko-as commented 3 weeks ago

change obj.user._id to user._id, restart meshcentral, then keep an eye out and see if it crashes again

I don't know if this is because of this or not, but yesterday the MeshCentral service was restarted twice with an error:

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

<--- Last few GCs --->

[37251:0x5cc88b0] 2009001 ms: Mark-sweep (reduce) 2042.7 (2083.8) -> 2041.6 (2083.8) MB, 1863.0 / 0.0 ms (average mu = 0.059, current mu = 0.001) allocation failure; scavenge might not succeed [37251:0x5cc88b0] 2010710 ms: Mark-sweep (reduce) 2042.8 (2083.8) -> 2041.6 (2083.8) MB, 1703.4 / 0.0 ms (average mu = 0.033, current mu = 0.003) allocation failure; scavenge might not succeed

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

<--- JS stacktrace --->

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

1: 0xb9c310 node::Abort() [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

2: 0xaa27ee [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

3: 0xd73eb0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, bool) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

4: 0xd74257 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, bool) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

5: 0xf515d5 [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

6: 0xf524d8 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

7: 0xf629d3 [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

8: 0xf63848 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

9: 0xf3e19e v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

10: 0xf3f567 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

11: 0xf1fae0 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

12: 0xf17554 v8::internal::FactoryBase::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

13: 0xf19808 v8::internal::FactoryBase::NewRawOneByteString(int, v8::internal::AllocationType) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

14: 0xf22dcd v8::internal::Factory::NewStringFromUtf8(v8::base::Vector const&, v8::internal::AllocationType) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

15: 0xd830c3 v8::String::NewFromUtf8(v8::Isolate, char const, v8::NewStringType, int) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

16: 0xc877b9 node::StringBytes::Encode(v8::Isolate, char const, unsigned long, node::encoding, v8::Local*) [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

17: 0xb6fcdd [/usr/bin/node]

-------- 6/10/2024, 6:16:01 PM ---- 1.1.24 --------

18: 0x1697e2f [/usr/bin/node]

RAM behavior at the time of error occurrence: image

The same RAM behavior sometimes occurred at the time of the error "meshuser.js:792"

si458 commented 3 weeks ago

@sheshko-as that error looks like you have a memory leak somewhere? JavaScript heap out of memory so OOM killed meshcentral which in turn restarted itself (as it should) i will upload the patch above as that fixes ur original issue and close this issue

please can you open a new bug report with your memory crash issue? but keep an eye on it, see if the memory usage on ur server climbs, the must be something not right with your setup? also can you add 'what database are u using with meshcentral and whats ur VPS specs?'

sheshko-as commented 3 weeks ago

the must be something not right with your setup? also can you add 'what database are u using with meshcentral and whats ur VPS specs?' The contents of "config.json" are written above.

VPS specs: vCPU 2,2 Ghz 4 cores RAM 8 Gb SSD 20 Gb

Database: mongod --version db version v7.0.11 Build Info: { "version": "7.0.11", "gitVersion": "f451220f0df2b9dfe073f1521837f8ec5c208a8c", "openSSLVersion": "OpenSSL 3.0.2 15 Mar 2022", "modules": [], "allocator": "tcmalloc", "environment": { "distmod": "ubuntu2204", "distarch": "x86_64", "target_arch": "x86_64" } }

sheshko-as commented 3 weeks ago

please can you open a new bug report with your memory crash issue?

ok

sheshko-as commented 3 weeks ago

@si458 I'm not sure if that's the problem and that it helped, please take your time to release "fbx obj.user._id undefined for rdp/ssh". I moved the server from the VPS to a Dedicated server, I need time to conduct additional tests. I will definitely write when there is a result.