Ylianst / MeshCentral

A complete web-based remote monitoring and management web site. Once setup you can install agents and perform remote desktop session to devices on the local network or over the Internet.
https://meshcentral.com
Apache License 2.0
3.83k stars 521 forks source link

cloudflare issues: multiple attempts/black screen #5302

Closed billettg closed 2 weeks ago

billettg commented 11 months ago

Describe the bug Clicking on the "Connect" button under "Desktop" or "Terminal" results in "Disconnected" approximately 9/10 times. Other times it will connect successfully. The disconnection is immediately shown after clicking "Connect".

To Reproduce Steps to reproduce the behavior:

  1. Go to "Terminal"
  2. Click on "Connect"
  3. See "Disconnected"

Expected behavior Connects successfully on every attempt.

Screenshots If applicable, add screenshots to help explain your problem.

Server Software (please complete the following information):

Client Device (please complete the following information):

Additional context The problem seems to only occur using CloudFlare, so I think that the proxy is causing websocket disconnection. The MeshCentral VM is hosted on the Hetzner platform. Others face the same issue (e.g https://www.reddit.com/r/MeshCentral/comments/15y28x3/random_disconnects_behind_cloudflare/)

Your config.json file

{
  "settings": {
    "cert": "mesh.example.com",
    "wanonly": true,
    "port": 443,
    "aliasport": 443,
    "redirport": 80,
    "rediraliasPort": 80,
    "webrtc": true,
    "wscompression": true,
    "allowlogintoken": true,
    "trustedproxy": "CloudFlare"
  },
  "domains": {
    "": {
      "newaccounts": false,
      "usernameisemail": true,
      "certurl": "https://mesh.example.com"
    }
  }
}
vesector commented 11 months ago

+1

si458 commented 11 months ago

it could be webrtc issue? have you tried with webrtc set to false?

si458 commented 11 months ago

also nodejs 12 is now EOL, please update node to the latest LTS (18)

vesector commented 11 months ago

I have tested with WebRTC disabled and also enabled, having the same behaviour.

One thing to note is that this issue started a few weeks ago.

billettg commented 11 months ago

Updated nodejs to 18.17.1 and same issue with or without webrtc enabled. I rebooted the server and checked nodejs version with the --version parameter.

The logs show connected then immediately disconnected. When it works I noticed that "Relay holding: * (::1) Authenticated" shows as well.

WEBREQUEST: (-) /meshrelay.ashx/.websocket?p=1&nodeid=node//iWMDZtF%24QteFec1amvHRz7nA8c4SqcXNrz7d7HN5JYxOTNwrOvK%24WOqkZ9XJjHuq&id=ma8286xee29&rauth=VEcPv1E75FdSda%24JPWUN7XeUM2plL7mDexVJ37kpZ%24v4selPtY3Kb%24Voz%40maz2MGG1E3IP4Pm9VcASvz3YSGyGo9CNWntAzVx3A%24BLwfSdTIPN4by%40o7 COOKIE: Decoded AESGCM cookie: {"ruserid":"user//gareth","x":"BQF0kfBA","time":1692713115000,"dtime":6268} RELAY: Relay connected: ma8286xee29 (- -> -) DISPATCH: DispatchEvent [ '', 'user//gareth' ] RELAY: Relay disconnect: ma8286xee29 (8- --> -) DISPATCH: DispatchEvent [ '', 'user//gareth', 'node//iWMDZtF$QteFec1amvHRz7nA8c4SqcXNrz7d7HN5JYxOTNwrOvK$WOqkZ9XJjHuq', 'mesh//DvVA69QdUY7bH8nbnnI4X@cSpkJOl$iIExyDvvBtFEudOUwUuEn5C2QNrHepdL6A' ]

jwiener3 commented 11 months ago

+1 I see the same issues, immediate disconnect for different hosts at different locations. All the same symptoms as stated above. I am proxying through Cloudflare, I am not sure if others are doing the same.

hiddenpcmaster commented 11 months ago

+1

dooley74 commented 11 months ago

Same here, just started a few days ago though. Running through Cloudflare as well.

jirijanata commented 11 months ago

+1 Also running through Cloudflare. Actually the problem first occured for me after Upgrade from 1.1.8 to 1.1.10.

supra36 commented 11 months ago

+1 same, although it didn't start immediately after 1.1.8=>1.1.10 transition, it started giving issues gradually and now it affects all agents no matter the OS.

Real-time monitoring using graphs works fine, however it fails to establish any other type of connection, let it be webrtc, stream, rdp, terminal or file transfer. MeshCentral Router also fails to RDP in "Configuring remote session" phase.

Also, I'm running mesh through cloudflare too.

si458 commented 11 months ago

this sounds to me like a cloudflare issue not a meshcentral issue, as nothing has changed recently to do with proxies or tunneling can anybody donate a domain/cloudflare details and i could look at it in my freetime?

supra36 commented 11 months ago

I took a look and filtered websocket related issues in google search from last 7 days. These are results.

https://community.cloudflare.com/t/websocket-not-stable-some-time-connect-some-time-no/547638

https://community.cloudflare.com/t/websocket-problem/547094

https://community.cloudflare.com/t/protected-web-socket-connection-dies-after-30-secs-cloudflare-ipv6-issue/547127

Unfortunately these haven't been answered.

Also it seems that CloudFlare bottlenecks websocket connections after reaching certain level of connections (or spikes). Does anybody know if CloudFlare started limiting websocket connections just recently or changed anything related to these limits or to how these are handled?

https://developers.cloudflare.com/support/network/using-cloudflare-with-websockets/

vesector commented 11 months ago

Great find @supra36, I have the feeling it is Cloudflare related more than the actual update of MC to .10... I am not 100% though.

I did a tcpdump and I saw Cloudflare giving [F] and [R] flags while reproducing the issue.

Would be nice if someone from the community that experiences the same issue and has a Cloudflare business or higher tier subscription opens a ticket with their support to see if this gets an answer from their side... as using the free tier there is no chance of support afaik.

NiceGuyIT commented 11 months ago

There's a Cloudflare Developers on Discord. Might be useful to ask around.

A post 3 days ago mentioned websocket disconnects when using IPv6 but not when using IPv4. Someone mentioned disabling IPv6 in Network -> IPv6 Compatibility.

vesector commented 11 months ago

Hi @NiceGuyIT,

Thank you for the suggestion.

I have disabled IPv6 through the API and the issue persists.

For anyone that wants to give it a try... I used PowerShell:

$headers=@{} $headers.Add("Content-Type", "application/json") $headers.Add("X-Auth-Email", "YOURCFEMAIL") $headers.Add("X-Auth-Key", "YOURAPIKEY") $response = Invoke-WebRequest -Uri 'https://api.cloudflare.com/client/v4/zones/YOURZONEID/settings/ipv6' -Method PATCH -Headers $headers -ContentType 'application/json' -Body '{ "value": "off" }'

supra36 commented 11 months ago

Just to let you know, I ditched cloudflare and used let's encrypt. It's now working fine.

billettg commented 11 months ago

Just to let you know, I ditched cloudflare and used let's encrypt. It's now working fine.

I decided to provision another Ubuntu server and installed MC 1.1.10 with Let's Encrypt certificate rather than CloudFlare and it also works fine. I wonder what changes have been made on CloudFlare side to be dropping the websocket connections, it must be something recent. I'm sure their support would be able to comment but unfortunately we are on a free tier without any technical support.

vesector commented 11 months ago

As per the questioning raised by @frogweh on #5309 he seems to be using Nginx and not CloudFlare and having the same issue... would be good to test that to see if it's reverse proxy in general or if we can completely isolated to CF. Also, would be good to try with an older version of .10 so we can rule out the update of MC completely. Will try to do this later this week.

si458 commented 11 months ago

@iribarrenjg if you read the post #5309 he says he uses a cloudflare certificate and if you read the logs shows heads for cf connecting from, so he is indeed using cloudflare! i do think the is an issue with cloudflare?

mon5termatt commented 11 months ago

chrome_3W6NJQEMVc

Having the same issue, also using cloudflare.

Vista2003 commented 11 months ago

+1 on 1.1.10 (Raspberry Pi OS 64 Bit via Cloudflare and Nginx Proxy Manager)

jwiener3 commented 11 months ago

Like others here, I changed my DNS records away from Cloudflare's proxy and things started working as expected. Also, this was happening before 1.1.10. I upgraded to 1.1.10 in hopes of it being a MeshCentral issue. Now I am trying to figure out how to lock down my environment without Cloudflare's rules. I know it is possible, but Cloudflare made it easy :).

si458 commented 11 months ago

@jwiener3 you can do an IP allow list for agents and clients which might help? https://github.com/Ylianst/MeshCentral/blob/b1d2d1aea96c5de48be210e89961347cb6b8b72b/meshcentral-config-schema.json#L592-L628

jwiener3 commented 11 months ago

@jwiener3 you can do an IP allow list for agents and clients which might help?

Thanks, I will take a look at that.

mon5termatt commented 11 months ago

So I can just disable the proxy checkmark on cloudflare? I have my domain registered with them.

mon5termatt commented 11 months ago

Screenshot_20230825_154508_Chrome

jwiener3 commented 11 months ago

So I can just disable the proxy checkmark on cloudflare? I have my domain registered with them.

Yes that is what I did, and then I had to open up my ACL on the server that was hosting Meshcentral to allow connections from anywhere, as I only had it allowing connections from cloudflare IP space.

appleimperio commented 11 months ago

+1 docker Raspberry pi with Cloudflare tunnels. Today I update the Cloudflare tunnel app and now is not connecting at all

jirijanata commented 11 months ago

Since yesterday I'm not able to connect anymore. A few days before were a few tries were needed, but now it doesn't work at all.

themanbornwithin commented 11 months ago

+1 for nginx reverse proxy and utilizing Cloudflare Proxy

mon5termatt commented 11 months ago

Disabled the Proxy Option, all agents are now offline.

mon5termatt commented 11 months ago

image

jwiener3 commented 11 months ago

Disabled the Proxy Option, all agents are now offline.

You need to make sure they can reach your server through the IP you have in your DNS records.

silversword411 commented 11 months ago

Anyone with deeper websocket knowledge...does this help? https://developers.cloudflare.com/durable-objects/api/hibernatable-websockets-api/#websocket-extensions

Br0kenSilos commented 11 months ago

I've been seeing this issue as well. Started a couple weeks ago but if I clicked Connect enough times, it would eventually go through. But as of a few days ago I cannot connect at all via Desktop, Terminal or Files. It just hangs at Setup... I can manage the devices through agent actions but cannot connect to them. It used to work fine and we have made no changes.

Using Cloudflare DNS > Cloudflare Tunnel > MC on Docker

si458 commented 11 months ago

hey all, just an update, ive had a domain and details donated for testing THANK YOU (you know who you are) and im seeing the exact same thing? even with a SINGLE device? so i cant see this being a MULTIPLE websocket connects error, but a cloudflare issue itself maybe? sadly the free tier support wont help! i need the paid tier to get proper support from cloudflare! groan... but for the moment, if people are using cloudflare, switch you proxy mode OFF on the dns entry, and make sure you have lets encrypt setup or a valid ssl for your domain, and it works a treat? its still bloody weird?

micudaj commented 11 months ago

Same for me. I have CGNAT, so I need to use some type of tunnel or use a VPS.

Also using Cloudflare DNS > Cloudflare Tunnel > Traefik Reverse Proxy > MC on Docker

jirijanata commented 11 months ago

MeshCentral is not compute intensive. You can use for many managed computers the cheapest VPS you find. On ionos.de is for example VPS for 1€/Month. 1 GB RAM, 1 CPU, IPv4, 10 GB SSD. With Debian you will have after fresh install of MeshCentral about 512 MB free memory.

dooley74 commented 11 months ago

MeshCentral is not compute intensive. You can use for many managed computers the cheapest VPS you find. On ionos.de is for example VPS for 1€/Month. 1 GB RAM, 1 CPU, IPv4, 10 GB SSD. With Debian you will have after fresh install of MeshCentral about 512 MB free memory.

While everything you said is true, it does nothing to help the issue faced by many people in this thread. That is, unless you mean for folks to not run Mesh through Cloudflare which in itself is very much not an ideal situation to put yourself in to voluntarily.

jirijanata commented 11 months ago

I meant it as alternative for the folks behind CGNAT which are hosting MeshCentral on their own servers with CloudFlare Tunnel. If someone is using CloudFlare Tunnel, than there is no simple solution for they to get MeshCentral working again (of course - VPS with some sort of Port Forwarding is also an option).

si458 commented 11 months ago

@dooley74 sadly at the moment, something cloudflare has done with its websockets and proxing has messed up meshcentral! so the only way is to use cloudflare BUT DISABLE proxied and use direct access with letsencrypt or your own ssl from cloudflare ive tried to debug this buts its very difficult and complex! while i can see the web socket connection hit my mesh server via cloudflare, it seems meshcentral then cant connect to the remote computer OR the client web UI via websockets? so im not sure if meshcentral is trying to connect to the wrong websockets or something else?

nickvk commented 11 months ago

Unfortunately I cant disable the proxy for Cloudflare, because I cant port forward at home. That's why I need Cloudflared Tunnels. Do you think it helps if more people complain about it at Cloudflare? Or is it more wise to just move over to another tunneling software? Maybe even self host it? (for the latter, does someone know any good alternative)

si458 commented 11 months ago

@nickvk sadly if u need tunnels, you are really stuffed at the moment

You can try complaining to cloudflare, but they won't help unless u have a paid account or they blame ur software

If meshcentral is critical, I would host it in the cloud rather than at home

Aws do lightsail for £5 a month, or the are others cheaper if u look around

dooley74 commented 11 months ago

Unfortunately I cant disable the proxy for Cloudflare, because I cant port forward at home. That's why I need Cloudflared Tunnels. Do you think it helps if more people complain about it at Cloudflare? Or is it more wise to just move over to another tunneling software? Maybe even self host it? (for the latter, does someone know any good alternative)

I've got a colleague looking in to this issue. If there is a way to find a work around, he'll find it. If we get a fix we'll update here as soon as we can.

Edit: We have a paid account at CF, I'll lodge a ticket tomorrow and see if I get a response. Not holding my breath though...

nickvk commented 11 months ago

The weird thing is is that I can still see and do some things. Like reinstall the agent core, see the current processes and chat with the agent. Is that because the websockets are only being used in the "desktop", "terminal" and "files" methods?

si458 commented 11 months ago

The weird thing is is that I can still see and do some things. Like reinstall the agent core, see the current processes and chat with the agent. Is that because the websockets are only being used in the "desktop", "terminal" and "files" methods?

this is because if you look at the web dev console, the agent core, process, chat etc all use the control.ashx web socket which sends commands to the meshcentral from the UI then to the remote pc and back via same route

where as the remote control, files, and terminal all use meshrelay.ashx which creates a new websocket to relay the picture, files etc, totally seperate from the main websocket!

si458 commented 11 months ago

NOTES FOR MYSELF AND OTHERS IF INTERESTED (im debugging)

meshrelay.js#864 obj.authenticated = false , so its denying the connection, and closing both websockets? user = null and obj.nouser is undefined? so maybe cloudlfare isnt passing cookies on the websocket?

EDIT2: it appears cloudflare are connecting to meshrelay.ashx TWICE from different ip address and ports, the first connection is all correct with user details, but the second connect contains incorrect details so then meshcentral assumes u sent wrong details, so hense you arent authorised, and it disconnects you trying to work out how to just ignore the invalid details if coming from cloudflare maybe?

EDIT3: ignore edit 2, checked my direct meshcentral and this is expected, 1 connection from ur browser, 1 connection from the remote computer you want to control

Matt-CyberGuy commented 11 months ago

+1 on this... this has been driving us nuts for weeks now

silversword411 commented 11 months ago

this has been driving us nuts for weeks

You have a better timeline on exactly when it started?

Do you have a paid cloudflare plan we can use for a support ticket?

Matt-CyberGuy commented 11 months ago

We weren't paying attention to when it started, and it seemed intermittent since some sites have the ip for our instance programmed into our clients DNS, so some sites were working fine. But one of our clients was complaining it was unusable for him, so I started digging and realized it was a proxy related issue.

It's just odd that it was working perfectly before.