Closed southeasterntech closed 3 years ago
Do you know what version you where running before? There has not been an agent change in while.
In "My Server" / "Console" type "agentstats" and let me know what you see.
Also, any errors in the general tab:
Also, add the following line to the config.json settings section:
"ignoreagenthashcheck": true
Then restart the server and let me know what you see.
One more thing I would check. Is it possible your running two instances of MeshCentral at the same time? If instance 1 gets all the agent connections and your looking at instance 2... then re-installing the agent makes the agent connect to instance 2.
One way to notice this is to compare the meshagent.msh of previous devices with the new meshagent.msh your installing. If there is a difference in the ServerID or MeshServer lines, there is a problem. .msh should look like this:
MeshName=Lab Computers
MeshType=2
MeshID=0xEDBE1BE37...B7DB6B4E7971EF34D36EBB6B875CF3D7DED1EE7CD5C
ServerID=D99362D5ED8...D707403E396CF0EF6DC2B3A42F735135FD
MeshServer=wss://central.mesh.meshcentral.com:443/agent.ashx
webSocketMaskOverride=1
Another issue that could have happened is that you removed the certificates in "meshcentral-data" causing your server to no longer be trusted by agents. Comparing the old and now .msh would indicate what is going on.
Also, if you are running plugins or a reverse proxy, let me know.
Yes on the reverse proxy (caddy) Here are the screens
Thanks,
Shane D. Lewis
On Mar 11, 2021, at 5:53 PM, Ylian Saint-Hilaire @.***> wrote:
Also, if you are running plugins or a reverse proxy, let me know.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
oops, looks like he screens didn't attach... checking for duplicate instances now.
Also agent hash check has been disabled for a couple of years now... I don't see much in the way of a second server running, only the Mesh Service is running, not a node instance also that I can tell...restarting caddy reverse proxy too... let's see what happens
So far, nothing obviously incorrect. The "coreIsStableCount" is low. Looks like agents are trying to connect, authenticating correctly and then disconnecting. I would look at the reverse proxy first for sure. By the way, I am running v0.7.85 on MeshCentral.com with 10k devices. I don't run a reverse proxy, but I don't have any obvious concerns about that version.
If there is a connectivity issue and it's fixed, it will take 10 to 20 minutes for agents to slowly reconnect.
Caddy reboot didnt fix.
But.... I go to three existing, not connected computers and mesh server ID is ending in : ....B76A
However, when I open Mesh...copy the invite link and then attempt to run the installer, the installer Server ID ends in :...44E3
So it does indeed seem I have a second server running.... or a different server running.... So, I have no visible NodeJS instances that I see and I only have the Mesh Central Service running.... and ideas on how to spot the rogue server?
Is the entire hash different? Or just the ending? If it's a different server, the hash would be completely different. What counts is what is in the .msh file, maybe the UI is not showing the full hash.
Assuming the hash is completely different, the main problem is that you have a different "meshcentral-data" folder in your server. The root cause is that these two files in "meshcentral-data" are different from the original:
agentserver-cert-public.crt
agentserver-cert-private.key
So, if you find these two files from a previous backup and put them back to the original and reset the server, all the agents will see your server as correct and accept to connect.
Make sure to backup your current "meshcentral-data" before making any changes.
sorry for the kindergarten scrawl.. Before Update After Update
What could have caused a different set of certs\data folder to suddenly appear? We didn't restore an old backup, snapshot or anything that I know of... Granted we have techs doing stuff in the office but I'd have known if they'd have messed with this server. Thanks..
I mean I can go to the server backup from 2 weeks ago, copy these certs and bring them over...
Now that I think about it.... back in January, I copied the data folder from an old server running NeDB to this new server running MongoDB and it's been much more stable.... I wonder if something finally caught up to me, cert expiration or something?...no clue.
New Server Cert Old Server
This is certainly the issue. The "ServerID" is used by the agent to authenticate the server. When connecting, the agent will ask the server to prove it's correct and the server must sign a random string given by the server with the "agentserver" certificate. This can't be bypassed. The agents will refuse to continue the connection unless the server is correct.
Your server can't have two "agentserver" certificates... so, it's going to have to be the old one or the new one.
Lastly, you can change the meshagent.msh file to what you like and restart the agent. No need to re-install the agent.
I can't help you with what happened, but you sure root caused the issue exactly right.
Also, if you can take a look if a "mesherrors.txt" file is in "meshcentral-data". If there is anything in there, would be interesting to see. Maybe there is a hint.
So...if I swap in the certs from the old server.... then this would work...to the exclusion of anything that was working on the new server.... right? Here is the stats from this week.... you can see a massive drop in registrations... so maybe I should go back to a snapshot a couple of days back and copy in the data folder...
Thanks so much Ylian... looks like we're close here.
You are exactly correct. If you put the old cert, the new agents will not connect anymore.
I'm going to go back to a snapshot from Mar 2 and grab the data folder and compare\drop that in and see what goes down... be interesting to see what changed.
So.... in the last 9 days...the certs are different.... pulling the data folder from Mar 2, the keys aren't the same. Any idea you can think of that could have caused the certs to regenerate? Swapped in the old data folder and many more devices are beginning to populate... thanks so much Ylian... We'll just have to go back and re-add the 40 or so newest agents. Thanks a million, as always you're incredible..
Glad I could help. I don't have any idea what could have happened, I have not gotten a report like this before. Can you take a look at "mesherrors.txt" in "meshcentral-data" and report back if there is anything in that file? Thanks.
Nothing in there at all other than a missing PNG file with our logo....which I just now restored from backup...... shrug. OK filing this one under the Bigfoot and Nessie folder..... Moving on to more important things. Thanks as always for the help. Shane
OK. Thanks. Still would have been nice to get a report on mesherror.txt.
No Problem, here's mesherrors.txt, it was working fine on 3\2\2021 so I didn't go past that....
@Ylianst see above, thanks again
Suddenly without explanation, a vast majority of clients are now disconnected on 0.7.85. Running the installer again produces the update button and it reconnects. But I can't do that across 1000 EP's... any idea on how to fix? Restarting the server didn't help.