Ylianst / MeshCentral

A complete web-based remote monitoring and management web site. Once setup you can install agents and perform remote desktop session to devices on the local network or over the Internet.
https://meshcentral.com
Apache License 2.0
4.26k stars 570 forks source link

All devices on My Devices Screen will Bounce their connection. #6414

Closed naelr closed 1 month ago

naelr commented 1 month ago

Describe the bug I have 17 devices on my meshcentral server. A mix of windows 11, Server 2019 and linux boxes. My MeshCentral server is hosted at a datacenter with a solid 1 gig connection to the internet. Every client as I am sitting at the My Devices screen will bounce their connection. All of them will go offline for anywhere form 2 seconds to sometimes a minute or more. I know they machines are online and have access to the internet because I have another remote software installed on most of them. (Simplehelp is the other remote software we use) I don't have it on all devices and I see this bouncing on all devices.

To Reproduce I don't know how to reproduce I just watch the My Devices screen and watch them go offline and back online.

Expected behavior I expect (if we have a good connection between device and server) the machines to stay online.

Screenshots These are all verified booted and online computers

image

moments later

image

5 seconds later

image

10 - 15 seconds later

image

most machines that go "offline" don't stay offline for more then a few seconds and some only blink.. but it it a consistent problem. I have one machine that does not have SimpleHelp on it and it is doing the same thing.

Server Software (please complete the following information):

Client Device (please complete the following information):

Remote Device (please complete the following information):

Additional context Add any other context about the problem here.

Your config.json file


{
  "$schema": "https://raw.githubusercontent.com/Ylianst/MeshCentral/master/meshcentral-config-schema.json",
  "__comment1__": "This is a simple configuration file, all values and sections that start with underscore (_) are ignored. Edit a section and remove the _ in front of the name. Refer to the user's guide for details.",
  "__comment2__": "See node_modules/meshcentral/sample-config-advanced.json for a more advanced example.",
  "settings": {
    "MongoDb": "mongodb://127.0.0.1:27018",
    "_mongodbcol": "meshcentral",
    "cert": "mesh.ctcsolutions.tech",
    "WANonly": true,
    "_LANonly": true,
    "_sessionKey": "Something I am not telling",
    "port": 5101,
    "aliasPort": 443,
    "_redirPort": 5100,
    "_redirAliasPort": 80
  },
  "domains": {
    "": {
      "title": "CTC Solutions",
      "title2": "Remote",
      "_minify": true,
      "_newAccounts": true,
      "_userNameIsEmail": true,
      "_loginPicture": "CTCS-background.png",
      "welcomePicture": "CTCS-background.png",
      "certUrl": "https://mesh.ctcsolutions.tech:443"
    }
  },
  "_letsencrypt": {
    "__comment__": "Requires NodeJS 8.x or better, Go to https://letsdebug.net/ first before trying Let's Encrypt.",
    "email": "myemail@mydomain.com",
    "names": "myserver.mydomain.com",
    "skipChallengeVerification": true,
    "production": true
  }
}
bkessell commented 1 month ago

I'm having the same issue - I noticed that during a large file copy this actually interrupted the transfer. Desktop connection doesn't seem noticeably affected.

si458 commented 1 month ago

@bkessell I believe ur issue might be if u have webrtc enabled! Set it to false or remove the value in ur config.json and restart meshcentral. I discovered this bug while attempting to fix webrtc the other week

@naelr are all the devices at the same site/external ip? Can u go into the my server tab and click show errors, does anything show?

Have u tried restarting the meshagent/restarting the machines themselves?

naelr commented 1 month ago

@si458

No I have 3 different sites talking to the meshcentral server and it happens at all sites.

There is nothing new in the error log after 8/13/2024 about when I got the install and configuration setup

I have restarted everything ... the windows servers I am monitoring get reboots once a week.

si458 commented 1 month ago

Have u checked ur reverseproxy server for errors?

u can try setting agentPong: 10 under settings in ur config.json, this might help keep the connections alive

Also you should add trustedProxy: "ipofreverseproxyhere" under settings in ur config.json, this helps tell meshcentral the real external ip addresses of ur remote devices

Br0kenSilos commented 1 month ago

I had this happening on my server a long time ago and if I remember correctly I set the AgentPong up to 20 and set WebRTC to false and I am pretty sure that is what resolved the issue for me.

"$schema": "http://info.meshcentral.com/downloads/meshcentral-config-schema.json", "settings": { "cert": "mc.mydomain.com", "_WANonly": true, "_LANonly": true, "_sessionKey": "MyVerySecretKey", "port": 443, "_aliasPort": 443, "redirPort": 80, "_redirAliasPort": 80, "AgentPong": 20, "agentAliasDns": "mc.mydomain.com", "TLSOffload": false, "SelfUpdate": false, "AllowFraming": "false", "WebRTC": "false", "trustedproxy": "nginx-proxy-manager,127.0.0.1,CloudFlare", "mpshighsecurity": true, "maxInvalidLogin": {

naelr commented 1 month ago

Added WebRTC, AgentPong and trustedproxy to my config.json file and it seems to be acting much better now. Nothing seems to have bounced for the last 5 minutes. Going to watch it for another day or so and if it is resolved I will close.. Thanks All!

DaanSelen commented 1 month ago

Just to clarify, the problem is the quickly 'flashing' offline of a machine just to have it jump back online? Because I've been having this as well. I use Nginx Reverse Proxy in front of MeshCentral.

si458 commented 1 month ago

@DaanSelen have u tried the same setting as listed above to add?

DaanSelen commented 1 month ago

@DaanSelen have u tried the same setting as listed above to add?

Yes, I am testing right now.

bkessell commented 1 month ago

These changes worked great for me. A large file transfer finished and no more bouncing. Thanks!

Br0kenSilos commented 1 month ago

FWIW, I too am running MeshCentral behind nginxproxymanager.

si458 commented 1 month ago

@bkessell basically, I think it's the AgentPong that helps

What websockets should do is send their own like ping/pong every x seconds to keep the websocket alive

but what I've noticed is this always isn't the case with the likes of remote locations or mobile connections

So the ping/pong never replies, so the websocket gets dropped

So we added in our own ping/pong method, which will send a ping/pong every x seconds

AgentPong sends a ping every x seconds to the remote device to keep the websocket alive but doesn't expect a reply

AgentPing sends a ping every x seconds to the remote device BUT expects a pong back from the remote device and if it doesn't receive it back by the next X seconds, it will drop the websocket connection and the remote device is expected to reconnect itself

DaanSelen commented 1 month ago

So with the AgentPong, it keeps the websocket alive instead of recreating the connection which causes the quick flicker of the agent?

si458 commented 1 month ago

yes AgentPong attempts to keep the websocket open

Br0kenSilos commented 1 month ago

Thanks @si458 for that explanation of ping pong. Thats the best I have found and now it makes total sense.

DaanSelen commented 1 month ago

Looks like this works for me, solved.

DaanSelen commented 1 month ago

What is the argument for not implementing a default of like 10 seconds or more?

si458 commented 1 month ago

@DaanSelen im not 100% sure, its just the way the code is / looks! im guessing back in the early days people never had issues with internet connections and proxies etc but now a days, people are very security consious and things can happen and packets can get lost, so maybe its needed

also using the options, adds extra packets/data/traffic on your network as all your hosts would be sending ping/pong every x seconds 24/7 all the time and at the same time too

DaanSelen commented 1 month ago

Clear. I thought a reasonable default would fit, but your points are strong.

si458 commented 1 month ago

@DaanSelen myself for example, my production machine, i look after 150 comps, and have never had to use agentping/poing ever! but it does seem to be most of peoples agentpong fixes seem to be related to people who use reverse proxies, so im guessing reverse proxies are NOT keeping the websockets open 24/7 and dropping them so thankfully the agentpong at least tells the reverse proxy to NOT close the websocket because the is data being transferred!

DaanSelen commented 1 month ago

@DaanSelen myself for example, my production machine, i look after 150 comps, and have never had to use agentping/poing ever! but it does seem to be most of peoples agentpong fixes seem to be related to people who use reverse proxies, so im guessing reverse proxies are NOT keeping the websockets open 24/7 and dropping them so thankfully the agentpong at least tells the reverse proxy to NOT close the websocket because the is data being transferred!

True, the proxies might be the problem not MeshCentral. I suppose notifying users through documentation is enough. I can propose an addition to: https://ylianst.github.io/MeshCentral/meshcentral/faq/

si458 commented 1 month ago

@DaanSelen PRs welcome! https://github.com/Ylianst/MeshCentral/blob/master/docs/docs/meshcentral/faq.md

DaanSelen commented 1 month ago

@DaanSelen PRs welcome! https://github.com/Ylianst/MeshCentral/blob/master/docs/docs/meshcentral/faq.md

If you give me some, I will make one.

silversword411 commented 3 weeks ago

The number of possible places your stutters could be coming from are huge...I'm sure this isn't a full list. Wi-Fi Stutters/Instability Packet Retransmissions VPN Client Session Instability PC TCP/UDP Socket Timeouts LAN to Router TCP Session Timeouts Router Network Management Firewall Policies Internet Service Provider (ISP) Network Issues Cell network optimizations VPN Server Session Drops Internet Path Latency/Packet Loss Reverse Proxy Timeout Server-Side Load Balancer Issues WebSocket Timeout (Server-Side) Server-Side Resource Limits WebSocket Application Logic

ping pong just brute forces active traffic thru the agent to server connection to generate continuous traffic to try and keep the ws session active.

Off the top of my head I've seen PC sleep settings, bad routers, cell phone connections, cradlepoint VPNs, router low session state timeouts, and general internet latency all cause periodic stuttering. It's a nightmare to isolate.