Closed majorproblem closed 2 years ago
We are still working to bring the cluster restore feature in the UI. Currently, to restore the cluster (when all nodes have been shut down, the cluster has to be restored) you can log in via ssh to the Galera Manager host and run the following command:
gmc cluster recover --name your_cluster_name
If you experience any issues, try running this command
gmc cluster status --name your_cluster_name
Please let me know if it helps.
But my problem is that I can not stop one node in a running cluster and then start it again; starting a node seems to be a one time activity. It is fully reproducable:
I then (successfully) stopped the last node, "node 1", from the GUI and logged in on the Manager server to see if I could restart my cluster from the GUI. As per your suggestion I entered gmc cluster recover --name my_cluster_name
which yielded this:
Cluster ID: 7
{
"id": "106",
"description": "cluster start/recover (\"7\")",
"createdAt": "2022-01-10T18:09:25.18618731Z",
"stoppedAt": null,
"executionInfo": {
"status": "new"
},
"meta": {
"cluster_id": "7"
}
}
and then gmc cluster status --name my_cluster_name
, which returned this information:
Cluster status: stopped
Host and nodes statuses:
+-----------+-------------+-----------+-------------+
| HOST NAME | HOST STATUS | NODE NAME | NODE STATUS |
+-----------+-------------+-----------+-------------+
| node 1 | healthy | node 1 | stopped |
| node 2 | healthy | node 2 | stopped |
| node 3 | healthy | node 3 | stopped |
+-----------+-------------+-----------+-------------+
So, I have no nodes running and no way to start the cluster again :-(
That's not an expected behavior. Can you please send an output of
gmc job list
+-----+--------------------------------+---------+-----------+--------------------------------+-----------------------+--------------------------------+
| ID | DESCRIPTION | STATUS | PARENT ID | ERROR | INFO | CREATED |
+-----+--------------------------------+---------+-----------+--------------------------------+-----------------------+--------------------------------+
| 9 | delete cluster ("2") | success | | | | 2021-12-14T17:00:28.99822453Z |
| 10 | server access check | success | | | | 2021-12-14T17:01:30.678639277Z |
| 11 | delete cluster ("1") | success | | | | 2021-12-14T17:02:10.069412572Z |
| 12 | host delete (host="node | success | 11 | | | 2021-12-14T17:02:10.071764639Z |
| | 1",clusterID="1") | | | | | |
| 13 | cluster deploy ("awef") | success | | | empty cluster created | 2021-12-14T17:02:42.935367023Z |
| 14 | server access check | success | | | | 2021-12-14T17:04:00.366122977Z |
| 15 | server access check | success | | | | 2022-01-06T12:33:19.582997547Z |
| 16 | host deploy ("node 1") | success | | | | 2022-01-06T12:33:32.381890225Z |
| 17 | host firewall configuration | success | 16 | | | 2022-01-06T12:33:32.38576009Z |
| | ("node 1") | | | | | |
| 18 | node install ("node 1") | success | | | | 2022-01-06T12:33:34.461371925Z |
| 19 | server access check | failure | | unable to connect: ssh: | | 2022-01-06T12:36:29.536623978Z |
| | | | | handshake failed: ssh: unable | | |
| | | | | to authenticate, attempted | | |
| | | | | methods [none publickey], no | | |
| | | | | supported methods remain | | |
| 20 | server access check | success | | | | 2022-01-06T12:37:42.726444315Z |
| 21 | host deploy ("node 2") | success | | | | 2022-01-06T12:37:50.048107066Z |
| 22 | host firewall configuration | success | 21 | | | 2022-01-06T12:37:50.051540847Z |
| | ("node 2") | | | | | |
| 23 | node install ("node 2") | success | | | | 2022-01-06T12:37:52.879033436Z |
| 24 | server access check | success | | | | 2022-01-06T12:40:25.133050242Z |
| 25 | host deploy ("node 3") | success | | | | 2022-01-06T12:40:31.501822943Z |
| 26 | host firewall configuration | success | 25 | | | 2022-01-06T12:40:31.506323394Z |
| | ("node 3") | | | | | |
| 27 | node install ("node 3") | success | | | | 2022-01-06T12:40:32.275327907Z |
| 28 | node start (clusterID=3, | failure | | unit failed to start: | | 2022-01-06T12:52:05.633913254Z |
| | nodeID=1) | | | Unexpected error. Check the | | |
| | | | | server logs.: only stopped | | |
| | | | | node can start | | |
| 29 | node start (clusterID=3, | failure | | unit failed to start: | | 2022-01-06T12:52:16.760086813Z |
| | nodeID=2) | | | Unexpected error. Check the | | |
| | | | | server logs.: only stopped | | |
| | | | | node can start | | |
| 30 | node start (clusterID=3, | success | | | | 2022-01-06T12:52:30.603347321Z |
| | nodeID=3) | | | | | |
| 31 | node start (clusterID=3, | failure | | unit failed to start: | | 2022-01-06T12:53:49.830283465Z |
| | nodeID=2) | | | Unexpected error. Check the | | |
| | | | | server logs.: only stopped | | |
| | | | | node can start | | |
| 32 | delete cluster ("3") | success | | | | 2022-01-06T12:59:25.266585504Z |
| 33 | host delete (host="node | success | 32 | | | 2022-01-06T12:59:25.27630953Z |
| | 1",clusterID="3") | | | | | |
| 36 | host delete (host="node | success | 32 | | | 2022-01-06T12:59:25.276452763Z |
| | 2",clusterID="3") | | | | | |
| 34 | host delete (host="node | success | 32 | | | 2022-01-06T12:59:25.277494624Z |
| | 3",clusterID="3") | | | | | |
| 35 | node stop | success | 33 | | | 2022-01-06T12:59:25.281141856Z |
| | (nodeID="1",clusterID="3") | | | | | |
| 37 | node stop | success | 34 | | | 2022-01-06T12:59:25.28505108Z |
| | (nodeID="3",clusterID="3") | | | | | |
| 38 | node stop | success | 36 | | | 2022-01-06T12:59:25.289365895Z |
| | (nodeID="2",clusterID="3") | | | | | |
| 39 | cluster deploy ("test") | success | | | empty cluster created | 2022-01-06T13:01:31.896844609Z |
| 40 | server access check | success | | | | 2022-01-06T13:02:58.966928565Z |
| 41 | host deploy ("node 1") | success | | | | 2022-01-06T13:03:07.867029045Z |
| 42 | host firewall configuration | success | 41 | | | 2022-01-06T13:03:07.873102483Z |
| | ("node 1") | | | | | |
| 43 | node install ("node 1") | success | | | | 2022-01-06T13:03:10.501092911Z |
| 44 | node start (clusterID=4, | success | | | | 2022-01-06T13:08:48.885018583Z |
| | nodeID=1) | | | | | |
| 45 | server access check | success | | | | 2022-01-06T13:09:15.027418544Z |
| 46 | host deploy ("node 2") | success | | | | 2022-01-06T13:09:20.288689493Z |
| 47 | host firewall configuration | success | 46 | | | 2022-01-06T13:09:20.295008111Z |
| | ("node 2") | | | | | |
| 48 | node install ("node 2") | success | | | | 2022-01-06T13:09:22.025175368Z |
| 49 | node start (clusterID=4, | success | | | | 2022-01-06T13:16:23.681837766Z |
| | nodeID=2) | | | | | |
| 50 | node start (clusterID=4, | failure | | unit failed to start: Exit | | 2022-01-06T14:34:35.110809962Z |
| | nodeID=1) | | | status is not 0. Database | | |
| | | | | engine start failure? | | |
| 51 | node start (clusterID=4, | failure | | unit failed to start: Exit | | 2022-01-06T14:36:02.124717437Z |
| | nodeID=1) | | | status is not 0. Database | | |
| | | | | engine start failure? | | |
| 52 | node stop ("node 2") | success | | | | 2022-01-06T14:39:06.504050513Z |
| 53 | node start (clusterID=4, | failure | | unit failed to start: Exit | | 2022-01-06T14:39:23.917529765Z |
| | nodeID=2) | | | status is not 0. Database | | |
| | | | | engine start failure? | | |
| 54 | node start (clusterID=4, | failure | | unit failed to start: Exit | | 2022-01-06T14:40:00.539804083Z |
| | nodeID=2) | | | status is not 0. Database | | |
| | | | | engine start failure? | | |
| 55 | node start (clusterID=4, | failure | | unit failed to start: Exit | | 2022-01-06T14:40:12.638037309Z |
| | nodeID=1) | | | status is not 0. Database | | |
| | | | | engine start failure? | | |
| 56 | node start (clusterID=4, | failure | | unit failed to start: Exit | | 2022-01-06T14:43:34.097094871Z |
| | nodeID=2) | | | status is not 0. Database | | |
| | | | | engine start failure? | | |
| 57 | node start (clusterID=4, | failure | | unit failed to start: Exit | | 2022-01-06T14:47:30.256997405Z |
| | nodeID=2) | | | status is not 0. Database | | |
| | | | | engine start failure? | | |
| 58 | delete cluster ("4") | failure | | not all hosts were deleted, | | 2022-01-06T15:02:20.387223299Z |
| | | | | cluster will not be deleted | | |
| 60 | host delete (host="node | success | 58 | | | 2022-01-06T15:02:20.389843823Z |
| | 2",clusterID="4") | | | | | |
| 59 | host delete (host="node | failure | 58 | failed to delete node | | 2022-01-06T15:02:20.38984414Z |
| | 1",clusterID="4") | | | | | |
| 61 | node stop | failure | 59 | uninstallation failed: | | 2022-01-06T15:02:20.40636385Z |
| | (nodeID="1",clusterID="4") | | | failed to run script for | | |
| | | | | ubuntu:20.04/mysql:8.0: failed | | |
| | | | | to execute cluster config | | |
| | | | | script (RunScriptWithConn): | | |
| | | | | command failed | | |
| | | | | (stepName=__step_no_001, | | |
| | | | | commandId=1, | | |
| | | | | commandType=IncludeCommand): | | |
| | | | | command failed | | |
| | | | | (stepName=apt_get_autoremove, | | |
| | | | | commandId=6, | | |
| | | | | commandType=ExecCommand): | | |
| | | | | wait: remote command exited | | |
| | | | | without exit status or exit | | |
| | | | | signal | | |
| 62 | node stop | success | 60 | | | 2022-01-06T15:02:20.408354481Z |
| | (nodeID="2",clusterID="4") | | | | | |
| 63 | delete cluster ("4") | success | | | | 2022-01-06T15:06:05.072604373Z |
| 64 | host delete (host="node | success | 63 | | | 2022-01-06T15:06:05.074955175Z |
| | 1",clusterID="4") | | | | | |
| 65 | node stop | success | 64 | | | 2022-01-06T15:06:05.078104448Z |
| | (nodeID="1",clusterID="4") | | | | | |
| 66 | cluster deploy ("test") | success | | | empty cluster created | 2022-01-06T15:08:52.333177305Z |
| 67 | server access check | success | | | | 2022-01-06T15:10:22.011009753Z |
| 68 | host deploy ("node 1") | success | | | | 2022-01-06T15:10:27.104638313Z |
| 69 | host firewall configuration | success | 68 | | | 2022-01-06T15:10:27.109144212Z |
| | ("node 1") | | | | | |
| 70 | node install ("node 1") | success | | | | 2022-01-06T15:10:28.929870522Z |
| 71 | node start (clusterID=5, | success | | | | 2022-01-06T16:23:32.560556872Z |
| | nodeID=1) | | | | | |
| 72 | delete cluster ("5") | success | | | | 2022-01-06T16:48:50.272589336Z |
| 73 | host delete (host="node | success | 72 | | | 2022-01-06T16:48:50.280732108Z |
| | 1",clusterID="5") | | | | | |
| 74 | node stop | success | 73 | | | 2022-01-06T16:48:50.284353941Z |
| | (nodeID="1",clusterID="5") | | | | | |
| 75 | cluster deploy ("beaver") | success | | | empty cluster created | 2022-01-06T19:31:29.806967341Z |
| 76 | delete cluster ("6") | success | | | | 2022-01-06T19:31:45.300020993Z |
| 77 | cluster deploy ("beaver") | success | | | empty cluster created | 2022-01-06T19:32:07.275622781Z |
| 78 | server access check | failure | | unable to connect: ssh: | | 2022-01-06T19:33:57.347275253Z |
| | | | | handshake failed: ssh: unable | | |
| | | | | to authenticate, attempted | | |
| | | | | methods [none publickey], no | | |
| | | | | supported methods remain | | |
| 79 | server access check | failure | | unable to connect: ssh: | | 2022-01-06T19:35:32.191904397Z |
| | | | | handshake failed: ssh: unable | | |
| | | | | to authenticate, attempted | | |
| | | | | methods [none publickey], no | | |
| | | | | supported methods remain | | |
| 80 | server access check | failure | | unable to connect: ssh: | | 2022-01-06T19:35:48.302764093Z |
| | | | | handshake failed: ssh: unable | | |
| | | | | to authenticate, attempted | | |
| | | | | methods [none publickey], no | | |
| | | | | supported methods remain | | |
| 81 | server access check | failure | | unable to connect: ssh: | | 2022-01-06T19:42:17.999755029Z |
| | | | | handshake failed: ssh: unable | | |
| | | | | to authenticate, attempted | | |
| | | | | methods [none publickey], no | | |
| | | | | supported methods remain | | |
| 82 | server access check | failure | | unable to connect: ssh: | | 2022-01-06T19:42:50.969178229Z |
| | | | | handshake failed: ssh: unable | | |
| | | | | to authenticate, attempted | | |
| | | | | methods [none publickey], no | | |
| | | | | supported methods remain | | |
| 83 | server access check | failure | | unable to connect: ssh: | | 2022-01-06T19:42:56.987056906Z |
| | | | | handshake failed: ssh: unable | | |
| | | | | to authenticate, attempted | | |
| | | | | methods [none publickey], no | | |
| | | | | supported methods remain | | |
| 84 | server access check | success | | | | 2022-01-10T17:51:43.099680972Z |
| 85 | host deploy ("node 1") | success | | | | 2022-01-10T17:51:51.022437008Z |
| 86 | host firewall configuration | success | 85 | | | 2022-01-10T17:51:51.027549736Z |
| | ("node 1") | | | | | |
| 87 | node install ("node 1") | success | | | | 2022-01-10T17:51:52.356703405Z |
| 88 | server access check | success | | | | 2022-01-10T17:54:24.67824428Z |
| 89 | host deploy ("node 2") | success | | | | 2022-01-10T17:54:32.73176212Z |
| 90 | host firewall configuration | success | 89 | | | 2022-01-10T17:54:32.735064446Z |
| | ("node 2") | | | | | |
| 91 | node install ("node 2") | success | | | | 2022-01-10T17:54:35.153230349Z |
| 92 | node stop ("node 2") | success | | | | 2022-01-10T17:59:16.165431124Z |
| 93 | node start (clusterID=7, | success | | | | 2022-01-10T17:59:26.544775641Z |
| | nodeID=2) | | | | | |
| 94 | node start (clusterID=7, | success | | | | 2022-01-10T17:59:55.159525716Z |
| | nodeID=1) | | | | | |
| 95 | node stop ("node 2") | success | | | | 2022-01-10T18:00:16.739482771Z |
| 96 | node start (clusterID=7, | failure | | unit failed to start: Exit | | 2022-01-10T18:00:33.280804253Z |
| | nodeID=2) | | | status is not 0. Database | | |
| | | | | engine start failure? | | |
| 97 | server access check | success | | | | 2022-01-10T18:01:53.050717427Z |
| 98 | host deploy ("node 3") | success | | | | 2022-01-10T18:02:00.488042173Z |
| 99 | host firewall configuration | success | 98 | | | 2022-01-10T18:02:00.49253466Z |
| | ("node 3") | | | | | |
| 100 | node install ("node 3") | success | | | | 2022-01-10T18:02:02.726289807Z |
| 101 | node start (clusterID=7, | success | | | | 2022-01-10T18:05:17.256691938Z |
| | nodeID=3) | | | | | |
| 102 | node start (clusterID=7, | failure | | unit failed to start: Exit | | 2022-01-10T18:06:22.812881903Z |
| | nodeID=2) | | | status is not 0. Database | | |
| | | | | engine start failure? | | |
| 103 | node stop ("node 3") | success | | | | 2022-01-10T18:06:32.902753283Z |
| 104 | node start (clusterID=7, | failure | | unit failed to start: Exit | | 2022-01-10T18:06:55.472200751Z |
| | nodeID=3) | | | status is not 0. Database | | |
| | | | | engine start failure? | | |
| 105 | node stop ("node 1") | success | | | | 2022-01-10T18:07:55.47232691Z |
| 106 | cluster start/recover ("7") | failure | | failed to get the most | | 2022-01-10T18:09:25.18618731Z |
| | | | | advanced node | | |
| 107 | cluster status ("7") | success | | | | 2022-01-10T18:11:57.798975447Z |
| 108 | host status (host="node | success | 107 | | | 2022-01-10T18:11:57.800538838Z |
| | 3",clusterID="7") | | | | | |
| 109 | host status (host="node | success | 107 | | | 2022-01-10T18:11:57.800601814Z |
| | 1",clusterID="7") | | | | | |
| 110 | host status (host="node | success | 107 | | | 2022-01-10T18:11:57.801278642Z |
| | 2",clusterID="7") | | | | | |
| 111 | node status | success | 109 | | | 2022-01-10T18:11:58.333722819Z |
| | (nodeID="1",clusterID="7") | | | | | |
| 112 | node status | success | 108 | | | 2022-01-10T18:11:58.337823848Z |
| | (nodeID="3",clusterID="7") | | | | | |
| 113 | node status | success | 110 | | | 2022-01-10T18:11:58.350099758Z |
| | (nodeID="2",clusterID="7") | | | | | |
+-----+--------------------------------+---------+-----------+--------------------------------+-----------------------+--------------------------------+
Thank you. Your problem requires deeper investigation. Could you please run this script:
cd ~
# admin or if you replaced the user, then your gm user name
export GM_USER=admin
export GM_PASSWORD=your_gm_password
export CLUSTER_NAME=your_gm_cluster_name
gmc job list 1>gmd.log 2>&1
gmc cluster logs get-all --name=${CLUSTER_NAME} --save-to=cluster.log 1>>gmd.log 2>&1
You will get two files: cluster.log
and gmd.log
. Would you please send us these files to info@galeracluster.com?
Note: logs have been received in email.
We identified the issues that are to be solved. We'll let you know as soon as we release the fixes.
We have released version 1.5.0 of Galera Manager, which includes a new feature in GUI: Recover Cluster. We also fixed MySQL cluster recovery support. If a cluster is fully stopped, to start it, you would need to use this new feature. Hopefully, it can solve your issue.
The latest update seems to have solved the problem, at least at a cursory examination. Will spend more time with it later this week and see if I can break it… ;-)
På 5 februari 2022 kl. 10:11:44, denisgcm @.***) skrev:
We have released version 1.5.0 of Galera Manager, which includes a new feature in GUI: Recover Cluster. We also fixed MySQL cluster recovery support. If a cluster is fully stopped, to start it, you would need to use this new feature. Hopefully, it can solve your issue.
— Reply to this email directly, view it on GitHub https://github.com/codership/galera-manager-support/issues/21#issuecomment-1030585766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA67FRYBBM2ZWZD4WRNMLMTUZTSU7ANCNFSM5LNIJYUQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
I am trying to load a galera cluster with data, using the Loading Physically method, as outlined on https://galeracluster.com/library/documentation/galera-manager-initializing-data.html.
I have Galera Manager freshly installed on Ubuntu Bionic Beaver (18.04 server) and one node, running the same setup. I have gone with the defaults in the installer, only taking into account keyboard localisation.
Installing Galera Manager was painless, as was setting up the first node. I then proceeded with the instructions and started the node from the ellipsis menu and, once synchronised, I then stopped it with the Stop command from the ellipsis menu. When I then tried to restart the node, just to see that it comes up OK before the next step, I got an error message stating that the Galera Manager failed to start it.
The log tab in Galera Manager is empty, i.e., it gives me events up until the shut down. None of my start attempts show in the log -- it remains at the position where it reads
[Note] /usr/sbin/mysqld: Shutdown complete
.