Ylianst / MeshCentral

A complete web-based remote monitoring and management web site. Once setup you can install agents and perform remote desktop session to devices on the local network or over the Internet.
https://meshcentral.com
Apache License 2.0
3.95k stars 533 forks source link

cloudflare issues: multiple attempts/black screen #5302

Closed billettg closed 1 month ago

billettg commented 1 year ago

Describe the bug Clicking on the "Connect" button under "Desktop" or "Terminal" results in "Disconnected" approximately 9/10 times. Other times it will connect successfully. The disconnection is immediately shown after clicking "Connect".

To Reproduce Steps to reproduce the behavior:

  1. Go to "Terminal"
  2. Click on "Connect"
  3. See "Disconnected"

Expected behavior Connects successfully on every attempt.

Screenshots If applicable, add screenshots to help explain your problem.

Server Software (please complete the following information):

Client Device (please complete the following information):

Additional context The problem seems to only occur using CloudFlare, so I think that the proxy is causing websocket disconnection. The MeshCentral VM is hosted on the Hetzner platform. Others face the same issue (e.g https://www.reddit.com/r/MeshCentral/comments/15y28x3/random_disconnects_behind_cloudflare/)

Your config.json file

{
  "settings": {
    "cert": "mesh.example.com",
    "wanonly": true,
    "port": 443,
    "aliasport": 443,
    "redirport": 80,
    "rediraliasPort": 80,
    "webrtc": true,
    "wscompression": true,
    "allowlogintoken": true,
    "trustedproxy": "CloudFlare"
  },
  "domains": {
    "": {
      "newaccounts": false,
      "usernameisemail": true,
      "certurl": "https://mesh.example.com"
    }
  }
}
Vista2003 commented 1 year ago

From the looks of it, the issue is where the the connection is made from the server to the remote system and not related to the connection from the WebUI to the local client.

I have 2 subdomains that are pointed to the Meshcentral server, remote.example.com and meshcentral.example.com and remote is the one that is connecting to the remote system and is now moved away from Cloudflare's CDN while meshcentral.example.com is only for accessing the webUI and still goes through Cloudflare's CDN. With remote.example.com now on the Nginx Reverse Proxy by itself, both domains are now working correctly.

si458 commented 1 year ago

From the looks of it, the issue is where the the connection is made from the server to the remote system and not related to the connection from the WebUI to the local client.

I have 2 subdomains that are pointed to the Meshcentral server, remote.example.com and meshcentral.example.com and remote is the one that is connecting to the remote system and is now moved away from Cloudflare's CDN while meshcentral.example.com is only for accessing the webUI and still goes through Cloudflare's CDN. With remote.example.com now on the Nginx Reverse Proxy by itself, both domains are now working correctly.

this is what im spotting as well when i was debugging it, the server sends 'c' to both the web browser client and the remote agent, the web browser receives 'c' and then sends data back to meshcentral BUT meshcentral never always gets the data returned AND the remote agent does the same thing, it recieves the 'c' command and sends data back to meshcentral but meshcentral never gets the data? it is bloody weird

gt2416 commented 1 year ago

I thought I was going crazy ! So glad I found this thread. I've been trying many things, but nothing works. Tried to proxy Caddy instead of meshcentral through the tunnel, tried directly, nothing. I use mesh at work and do not have any port forwards for security reasons. Looks like I'll have to wait for a fix.

si458 commented 1 year ago

The problem is there is no fix, we haven't changed anything, it's something cloudflare have done/changed so we are be-holden to them!

⁣Get BlueMail for Android ​

On 30 Aug 2023, 13:54, at 13:54, gt2416 @.***> wrote:

I thought I was going crazy ! So glad I found this thread. I've been trying many things, but nothing works. Tried to proxy Caddy instead of meshcentral through the tunnel, tried directly, nothing. I use mesh at work and do not have any port forwards for security reasons. Looks like I'll have to wait for a fix.

-- Reply to this email directly or view it on GitHub: https://github.com/Ylianst/MeshCentral/issues/5302#issuecomment-1699115199 You are receiving this because you commented.

Message ID: @.***>

Vista2003 commented 1 year ago

I thought I was going crazy ! So glad I found this thread. I've been trying many things, but nothing works. Tried to proxy Caddy instead of meshcentral through the tunnel, tried directly, nothing. I use mesh at work and do not have any port forwards for security reasons. Looks like I'll have to wait for a fix.

Since you're using it for work, could you open a support call with Cloudflare to see if they can look into this for us as I think a lot of us are using Cloudflare free for personal use.

si458 commented 1 year ago

@billettg please can you change the title of the issue to include 'cloudflare' to help others find out about this issue at the moment

gt2416 commented 1 year ago

Since you're using it for work, could you open a support call with Cloudflare to see if they can look into this for us as I think a lot of us are using Cloudflare free for personal use.

I also use the free tier. I once asked them about how much it would cost and explained we have 2 webapps with maybe 10 people using it at the same time max. They quoted me $5000/mo. Im quite happy with the tree tier lol The person I talked to definitely didnt understand what I was saying, I think she thought we are like hosting a cloud application and selling it or something.

AlphaIrri commented 1 year ago

@si458 I switched over to DNS only on cloudflare and to the Let'sEncrypt SSL. Now only the new test agent is connecting. Do I have to reinstall all agents to get them to work? I tried ignoreAgentHashCheck and removing TLSOffload already.

anthonyb800 commented 1 year ago

@si458 I switched over to DNS only on cloudflare and to the Let'sEncrypt SSL. Now only the new test agent is connecting. Do I have to reinstall all agents to get them to work? I tried ignoreAgentHashCheck and removing TLSOffload already.

Ran into the same thing myself - I had to migrate my mesh to a new server as I was using CF tunnels on prem previously. Now it is in the cloud behind a reverse proxy and offloading the TLS to that. None of my agents connected for about 30 minutes. Could not figure out what the problem was, but turns out it was just DNS propagation (figures...). Once the new record populated to all the endpoints, the agents started to connect again one by one.

As long as your config file contains the cert line that points to your public Mesh URL if you are behind a reverse proxy, you should be fine as far as I know.

"cert": "mesh.example.com",

si458 commented 1 year ago

@si458 I switched over to DNS only on cloudflare and to the Let'sEncrypt SSL. Now only the new test agent is connecting. Do I have to reinstall all agents to get them to work? I tried ignoreAgentHashCheck and removing TLSOffload already.

if your using the same DNS name, just remove the trustproxy and tlsoffload if you are using meshcentral directly without any loadbalancers or proxies (like nginx or traefik), i believe

AlphaIrri commented 1 year ago

if your using the same DNS name, just remove the trustproxy and tlsoffload if you are using meshcentral directly without any loadbalancers or proxies (like nginx or traefik), i believe

@si458 I am still going to be using NGINX PM for reverse proxy. If I remove TLSOffload, then I get a bad gateway error. Trustproxy is removed.

@anthonyb800 It's been over an hour and only the new agent is connecting. I'll check tomorrow to see if the agents start connecting.

billettg commented 1 year ago

if your using the same DNS name, just remove the trustproxy and tlsoffload if you are using meshcentral directly without any loadbalancers or proxies (like nginx or traefik), i believe

@si458 I am still going to be using NGINX PM for reverse proxy. If I remove TLSOffload, then I get a bad gateway error. Trustproxy is removed.

@anthonyb800 It's been over an hour and only the new agent is connecting. I'll check tomorrow to see if the agents start connecting.

I'm now using NPM for reverse proxy and use TLSOffload pointing to the proxy, with a lets encrypt cert set in the SSL section with "Websockets Support" enabled. Works fine for me. If you need any config let me know.

AlphaIrri commented 1 year ago

if your using the same DNS name, just remove the trustproxy and tlsoffload if you are using meshcentral directly without any loadbalancers or proxies (like nginx or traefik), i believe

@si458 I am still going to be using NGINX PM for reverse proxy. If I remove TLSOffload, then I get a bad gateway error. Trustproxy is removed. @anthonyb800 It's been over an hour and only the new agent is connecting. I'll check tomorrow to see if the agents start connecting.

I'm now using NPM for reverse proxy and use TLSOffload pointing to the proxy, with a lets encrypt cert set in the SSL section with "Websockets Support" enabled. Works fine for me. If you need any config let me know.

I'd like some help getting it configured, if you have time. That's my exact setup I'm trying to accomplish again. Is it okay to post my config here or do you want me to post elsewhere?

silversword411 commented 1 year ago

If you need any config let me know.

A writeup would be great...and also as an addition to: https://ylianst.github.io/MeshCentral/meshcentral/#nginx-reverse-proxy-setup

silversword411 commented 1 year ago

Some more info. Found this: https://ylianst.github.io/MeshCentral/meshcentral/#traefik-reverse-proxy-setup

And there's a "agentConfig": [ "webSocketMaskOverride=1" ], there that we haven't been able to track back into any meshcentral source code.

Also found a issue with a great agent debug process walkthru for those playing along with troubleshooting: https://github.com/Ylianst/MeshCentral/issues/1046 which I'll eventually get written up for the docs

If you have something helpful to add, please chime in

si458 commented 1 year ago

just an update for all, i believe the issue is something to do with the meshagent connecting to cloudflare

if you create a local group, enable ssh, then setup a local device and use the web-ssh, you can ssh no problems

also if you change the remote devices DNS (either in router or even in cloudflare by disabling the proxied feature in DNS) and make your meshcentral DNS name point directly to your meshcentral IP, then suddenly your web browser (which can still be relayed by cloudflare dns proxied) you can connect to the remote device with no problems!

whats also weird is you can use the console tab to run command like agentupdate, info, osinfo, netinfo, etc no problems because these command are using the control.ashx and agent.ashx websockets, its only the meshrelay.ashx that doesnt seem to work?

sadly im not a C language developer and its out of my lead!

if you are a C developer, please look at https://github.com/ylianst/MeshAgent and see if you can work out why the websocket for meshrelay.ashx is disconnecting?

Matt-CyberGuy commented 1 year ago

also if you change the remote devices DNS (either in router or even in cloudflare by disabling the proxied feature in DNS) and make your meshcentral DNS name point directly to your meshcentral IP, then suddenly your web browser (which can still be relayed by cloudflare dns proxied) you can connect to the remote device with no problems!

Ya, this is what I was referencing for why it took us so long to notice the issue. Endpoints sitting behind networks we manage weren't having any issue since we have the IP of the FQDN pointing directly to our VPS via DNS. But systems either on the road, roaming, or on networks we don't manage were all having the connect/disconnect/connect/disconnect X 10 issue.

This is definitely an issue related to something Cloudflare changed. I've noticed a few other anomalous things with other services we run through their proxy's as well. And for the record, we only use their free tier.

si458 commented 1 year ago

@Matt-CyberGuy it seems every user here is a free cloudflare user! I even went to the extreme of creating my own account and domain just to debug this issue, but I refuse to pay for a paid plan! Unless again anybody would like to donate money or a pro account? It's $25 for 1 month

jwiener3 commented 1 year ago

Unless again anybody would like to donate money or a pro account? It's $25 for 1 month

I would be willing to send you $5-10 to try and figure out the issue.

Matt-CyberGuy commented 1 year ago

I don't mind paying for it, but I'm in the middle of moving right now, so I don't have a lot of time to test stuff out. Will probably do it next week if no else gets to it first.

si458 commented 1 year ago

Unless again anybody would like to donate money or a pro account? It's $25 for 1 month

I would be willing to send you $5-10 to try and figure out the issue.

Thank you! You can use my github sponsors page https://github.com/sponsors/si458 or my PayPal page https://paypal.me/si458, one time or recurring it's totally up to yourself including amount!

gt2416 commented 1 year ago

I have opened a topic on cloudflare forums. Hopefully a cloudflare engineer or member will see it. https://community.cloudflare.com/t/meshcentral-agents-can-no-longer-connect-using-tunnel/551937

PrplHaz4 commented 1 year ago

Some more info. Found this: https://ylianst.github.io/MeshCentral/meshcentral/#traefik-reverse-proxy-setup

And there's a "agentConfig": [ "webSocketMaskOverride=1" ], there that we haven't been able to track back into any meshcentral source code.

Also found a issue with a great agent debug process walkthru for those playing along with troubleshooting: #1046 which I'll eventually get written up for the docs

If you have something helpful to add, please chime in

webSocketMaskOverride was removed in https://github.com/Ylianst/MeshAgent/commit/0ea6e28021875f4359413795f1aa8014f6d6ef29

That area was problematic with Traefik v1, where the agent would try to auto-update right after install, but it would fail and leave a broken core. The workaround was to disable updates. I'm not sure if it was resolved by upgrading Traefik to v2 or by the commit above, but it's not an issue for me anymore.

si458 commented 1 year ago

thanks for the catch @PrplHaz4 ! means we can update the docs etc as this option is no longer relevant!

si458 commented 1 year ago

just an update for the community thank you to those who donated money! i bought 1 months cloudflare pro to test (its per domain robbing sods!)

and it seems if you do INDEED have PRO then the remote control will work BUT only about 50-60% of the time? sometimes i had to click the connect button twice in a row to get it to connect

from what ive been able to figure out, the connects between your browser and mesh through cloudflare have no problems, the control.ashx web socket from the remote agent has no problems either connecting, as you can clear core, upload core, run console commands, even get power stats etc, no problem

BUT the min you try to relay (use remote, file, terminal, tcp (in meshcentralrouter), the agent connects to the meshcentral but then instantly disconnects for no reason? from what we have 'googled' it might be the meshagent isnt using the websocket in a RFC compliant manner, and maybe cloudflare are disconnecting it?

but again if you are a C or C++ or C# developer, please look at https://github.com/ylianst/MeshAgent and see if you can work out why the websocket for meshrelay.ashx is disconnecting?

NiceGuyIT commented 1 year ago

Does it happen if use use WebRTC?

    "WebRTC": true,
    "webrtcConfig": {
      "iceServers": [
        { "urls": "stun:stun.services.mozilla.com" },
        { "urls": "stun:stun.l.google.com:19302" }
      ]
    },
si458 commented 1 year ago

Does it happen if use use WebRTC?

    "WebRTC": true,
    "webrtcConfig": {
      "iceServers": [
        { "urls": "stun:stun.services.mozilla.com" },
        { "urls": "stun:stun.l.google.com:19302" }
      ]
    },

@NiceGuyIT ive never been able to get webrtc to work, but just copied ur code and sadly it seems to be the same thing sadly, it still creates the meshrelay.ashx websocket fine in browser and the agent calls the meshrelay.ashx websocket too but again straight away the agent disconnects for some random unknown reason?

RJ-487 commented 1 year ago

Mesh assistant works just fine through CloudFlare for me, I can remote in with no problem. Its when i try to remote into a machine that has a agent installed is the issue on CloudFlare.

RJ-487 commented 1 year ago

Also, the android agent connects just fine when remoting to it.

si458 commented 1 year ago

Mesh assistant works just fine through CloudFlare for me, I can remote in with no problem. Its when i try to remote into a machine that has a agent installed is the issue on CloudFlare.

Thanks for the hint @RJ-487! I haven't tried the assistant or android apps, only the meshagent!

RJ-487 commented 1 year ago

Mesh assistant works just fine through CloudFlare for me, I can remote in with no problem. Its when i try to remote into a machine that has a agent installed is the issue on CloudFlare.

Thanks for the hint @RJ-487! I haven't tried the assistant or android apps, only the meshagent!

Are we sure that it’s still a cloud flare issue?

honestlai commented 1 year ago

Mesh assistant works just fine through CloudFlare for me, I can remote in with no problem. Its when i try to remote into a machine that has a agent installed is the issue on CloudFlare.

Thanks for the hint @RJ-487! I haven't tried the assistant or android apps, only the meshagent!

Are we sure that it’s still a cloud flare issue?

If you bypass Cloudflare's proxy'ing, the issue goes away immediately. We have hundreds of systems in our portal. The ones that have the server IP hardcoded or set in DNS have zero problems connecting. But every single system that references the proxy IP's for their connections all have the issue described in this post. And most everyone here has reported they made no changes to their configs in Meshcentral or Cloudflare when the issues began.

There is likely a fix via coding for the Mesh agents, but because the project is no longer being supported by Intel, I think it's up to us to figure it out... ooor, we have to somehow figure out what changes were made on Cloudflares side and figure out how to walk them back via Cloudflare's interface.

RJ-487 commented 1 year ago

Mesh assistant works just fine through CloudFlare for me, I can remote in with no problem. Its when i try to remote into a machine that has a agent installed is the issue on CloudFlare.

Thanks for the hint @RJ-487! I haven't tried the assistant or android apps, only the meshagent!

Are we sure that it’s still a cloud flare issue?

If you bypass Cloudflare's proxy'ing, the issue goes away immediately. We have hundreds of systems in our portal. The ones that have the server IP hardcoded or set in DNS have zero problems connecting. But every single system that references the proxy IP's for their connections all have the issue described in this post. And most everyone here has reported they made no changes to their configs in Meshcentral or Cloudflare when the issues began.

There is likely a fix via coding for the Mesh agents, but because the project is no longer being supported by Intel, I think it's up to us to figure it out... ooor, we have to somehow figure out what changes were made on Cloudflares side and figure out how to walk them back via Cloudflare's interface.

Yeah I’ve been away from mesh central for 6+ months. I just got back into it and got everything set up with cloud flare and bam it doesn’t work lol. I’ve noticed that Ylian has been MIA and now I know why.

DollarStoreCPU commented 1 year ago

+1

RJ-487 commented 1 year ago

Has anyone played around with the Edge Certificate settings in CF?

pennyblack commented 1 year ago

this issue affects me, have tried 1.1.0 thru 1.1.10 and its still an issue.

RJ-487 commented 1 year ago

this issue affects me, have tried 1.1.0 thru 1.1.10 and its still an issue.

did you try anything under 1.1.0?

RJ-487 commented 1 year ago

its working perfectly fine going through cloudflare proxy with the android agent. its still a no go with windows and linux agents.

frogweh commented 1 year ago

I can confirm that going through my cloudflare proxy setup with edge SSL cert, through NPM it still doesn't work with installed agents, but it worked flawlessly with meshcentral assistant and requesting help. Why does that work fine, but the installed agent doesn't?

si458 commented 1 year ago

I can confirm that going through my cloudflare proxy setup with edge SSL cert, through NPM it still doesn't work with installed agents, but it worked flawlessly with meshcentral assistant and requesting help. Why does that work fine, but the installed agent doesn't?

@frogweh We still don't know. I debugged it and the assistant uses the same websocket name and commands, as the agent does

The only thing I can think of is cloudflare changed something to do with the websockets and compliancy of the RFC, and maybe the mesh agent doesn't comply with it

Sadly I'm not a c++ or c# developer so I can't fully debug the agent!

RJ-487 commented 1 year ago

I can confirm that going through my cloudflare proxy setup with edge SSL cert, through NPM it still doesn't work with installed agents, but it worked flawlessly with meshcentral assistant and requesting help. Why does that work fine, but the installed agent doesn't?

@frogweh We still don't know. I debugged it and the assistant uses the same websocket name and commands, as the agent does

The only thing I can think of is cloudflare changed something to do with the websockets and compliancy of the RFC, and maybe the mesh agent doesn't comply with it

Sadly I'm not a c++ or c# developer so I can't fully debug the agent!

I'm on version 1.1.10 , I'm not sure about the agent but it says compiled on Dec 2022. Was that the last time the agent was updated?

si458 commented 1 year ago

I'm on version 1.1.10 , I'm not sure about the agent but it says compiled on Dec 2022. Was that the last time the agent was updated?

yes @RJ-487 that was the last time the agent was updated, the only thing you could try is running version 1.0.0, with noAgentUpdate set to 1 then once up and running, clear the core, update the core, then agentupdate and let it rollback a few versions and see if it works but i dont believe it will work

apyoungblood commented 1 year ago

+1

Cloudflare <--> SWAG (linuxserver.io/swag on docker hub, it's basically an nginx reverse proxy with certbot) <--> docker image jamesits/meshcentral2

MC version 1.1.10 currently, though I've tried 1.1.0, 1.1.9 all while these issue symptoms have been present.

Server started acting flaky around the same time most people mentioned about 2 weeks ago. You could still connect if you tried the connect button enough. Now there's no connection at all. I've tried temporarily pausing the Cloudflare proxy but it has not resolved the issue yet. I don't have any Android/mobile nodes connected to my Mesh to test that but I can affirm that all Windows, Linux, and macOS nodes are displaying these same symptoms.

@si458 I may take a look at the MeshAgent. I'm not officially a c++ developer but I do work with it for video game development projects as a hobby job.

Please let me know if you find out a fix/workaround or need help in any way. Like everyone here I imagine this is a high priority for your related admin work.

si458 commented 1 year ago

@apyoungblood have a crack at it! ive still got nowhere debugging it because i cant figure out how to debug the C language for the meshagent the only TEMP solution is to upgrade from cloudflare FREE to cloudflare PRO, but then its still only a 50-60% success rate of connections, but its better than currently a 1% sucess rate with FREE

EDIT: i also dont use cloudflare in the slightest, everything is self hosted in own datacenter, so i dont have the issue at all im just trying to fix/track it for the community!

RJ-487 commented 1 year ago

@si458 Thanks for all of your contributions.

techguy930 commented 1 year ago

Thank you! I was working on this for 3 days scratching my head and now i finally got ngnix proxy manager with my domain and letsencrypt certificates working after simply turning off cloudflare proxy! :) I don't know how secure that is but oh well i just use meshcentral for family :)

si458 commented 1 year ago

Thank you! I was working on this for 3 days scratching my head and now i finally got ngnix proxy manager with my domain and letsencryp certificates working after simply turning off cloudflare proxy! :) I don't know how secure that is but oh well i just use meshcentral for family :)

The difference between the dns proxied and none proxied is simple

None proxied is where the remote devices all connect directly to your server as cloudflare show ur servers IP address in dns requests Where as with proxied, cloudflare will return an ip address of their servers, and all remote devices connect to them directly but then cloudflare will connect to your servers (relaying/reverse proxy)

So in a way proxied is very helpful as it hides ur ip address and avoids direct connections to your server (apart from cloudflare connecting to you)

But then you have problems like currently where ur remote devices all connect to cloudflare but then stuff like websockets stop working because they changed something in their backend and it effects everyone!

So using the none proxied is the best option at the moment if you use cloudflare

RJ-487 commented 1 year ago

Thank you! I was working on this for 3 days scratching my head and now i finally got ngnix proxy manager with my domain and letsencryp certificates working after simply turning off cloudflare proxy! :) I don't know how secure that is but oh well i just use meshcentral for family :)

The difference between the dns proxied and none proxied is simple

None proxied is where the remote devices all connect directly to your server as cloudflare show ur servers IP address in dns requests Where as with proxied, cloudflare will return an ip address of their servers, and all remote devices connect to them directly but then cloudflare will connect to your servers (relaying/reverse proxy)

So in a way proxied is very helpful as it hides ur ip address and avoids direct connections to your server (apart from cloudflare connecting to you)

But then you have problems like currently where ur remote devices all connect to cloudflare but then stuff like websockets stop working because they changed something in their backend and it effects everyone!

So using the none proxied is the best option at the moment if you use cloudflare

Yep! I will either run mine through cloudflare proxy or run it on my local network only. I do not like having ports opened directly to the world.

It’s interesting how many mesh central servers you can find on Shodan.io without a proxy in front of them.

Yeah I know with cloud flare proxy it’s still somewhat exposed to the internet but you have some added protection using cloudflare , I like the country blocking feature , there is no reason that my mesh server should be needed outside of the US.

si458 commented 1 year ago

@RJ-487 from what others are saying, using the cloudflare dns proxied or cloudflare tunnels won't work as you won't be able to connect to remote devices, so you need to use direct access at the moment with ur own IP address and proxied disabled

RJ-487 commented 1 year ago

@RJ-487 from what others are saying, using the cloudflare dns proxied or cloudflare tunnels won't work as you won't be able to connect to remote devices, so you need to use direct access at the moment with ur own IP address and proxied disabled

@si458 I’m still using cloudflare proxy , I’m just using mesh assistant Vs mesh agent for the time being.