codership / galera-manager-support

Galera Manager Support Repository
7 stars 2 forks source link

Nginx 502 Bad gateway. GMD 111: Connection refused) while connecting to upstream #70

Closed alokispandey closed 7 months ago

alokispandey commented 8 months ago

Hi, Recently I managed to install GMD successfully using the installer . All services are active but still getting error" 502 BAD Gateway

Installation Closing message:

Galera Manager installation finished. Enter http://10.112.48.69 in a web browser to access. Please note, you chose to use an unencrypted http protocol, such connections are prone to several types of security issues. Always use only trusted networks when connecting to the service. INFO[0077] Logs DB url: http://10.112.48.69:8081 IMPORTANT: ensure TCP ports 80, 8081 are open in firewall. INFO[0077] Below you can see Logs DB credentials: DB name: gmd DB user: gmd DB password: 9iKc2lXzdW The installation log is located at /tmp/gm-installer.log

Service status : NGINX

● nginx.service - The nginx HTTP and reverse proxy server Loaded: loaded (/usr/lib/systemd/system/nginx.service; disabled; vendor preset: disabled) Active: active (running) since Sat 2023-11-04 06:03:11 GMT; 28min ago Process: 1089519 ExecStart=/usr/sbin/nginx (code=exited, status=0/SUCCESS) Process: 1089517 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=0/SUCCESS) Process: 1089511 ExecStartPre=/usr/bin/rm -f /run/nginx.pid (code=exited, status=0/SUCCESS) Main PID: 1089520 (nginx) Tasks: 9 (limit: 205278) Memory: 11.6M CGroup: /system.slice/nginx.service ├─1089520 nginx: master process /usr/sbin/nginx ├─1089521 nginx: worker process ├─1089522 nginx: worker process ├─1089523 nginx: worker process ├─1089524 nginx: worker process ├─1089525 nginx: worker process ├─1089526 nginx: worker process ├─1089527 nginx: worker process └─1089528 nginx: worker process

Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl systemd[1]: Starting The nginx HTTP and reverse proxy server... Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl nginx[1089517]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl nginx[1089517]: nginx: configuration file /etc/nginx/nginx.conf test is successf> Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl systemd[1]: Started The nginx HTTP and reverse proxy server.

Service status : GMD

● gmd.service - gmd - galera manager daemon Loaded: loaded (/usr/lib/systemd/system/gmd.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2023-11-04 06:31:25 GMT; 17s ago Main PID: 1091659 (gmd) Tasks: 12 (limit: 205278) Memory: 20.8M CGroup: /system.slice/gmd.service └─1091659 /usr/bin/gmd run

Nov 04 06:31:25 wkflaspsitcdb01.idm.oam.mbnl systemd[1]: Started gmd - galera manager daemon. Nov 04 06:31:25 wkflaspsitcdb01.idm.oam.mbnl gmd[1091659]: time="2023-11-04T06:31:25.755" level=info msg="Starting gmd" func=> Nov 04 06:31:25 wkflaspsitcdb01.idm.oam.mbnl gmd[1091659]: time="2023-11-04T06:31:25.755" level=info msg="Listening on 127.0.> Nov 04 06:31:25 wkflaspsitcdb01.idm.oam.mbnl gmd[1091659]: time="2023-11-04T06:31:25.755" level=info msg="ConfigDir = /var/li> Nov 04 06:31:25 wkflaspsitcdb01.idm.oam.mbnl gmd[1091659]: time="2023-11-04T06:31:25.755" level=info msg="LogsDir = /var/log

Service status : influxd

influxdb.service - InfluxDB is an open-source, distributed, time series database Loaded: loaded (/usr/lib/systemd/system/influxdb.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2023-11-04 06:03:11 GMT; 30min ago Docs: https://docs.influxdata.com/influxdb/ Process: 1089450 ExecStart=/usr/lib/influxdb/scripts/influxd-systemd-start.sh (code=exited, status=0/SUCCESS) Main PID: 1089455 (influxd) Tasks: 13 (limit: 205278) Memory: 58.0M CGroup: /system.slice/influxdb.service └─1089455 /usr/bin/influxd

Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl systemd[1]: Starting InfluxDB is an open-source, distributed, time series databa> Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl influxd-systemd-start.sh[1089456]: Command "print-config" is deprecated, use the> Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl influxd-systemd-start.sh[1089479]: Command "print-config" is deprecated, use the> Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl influxd-systemd-start.sh[1089493]: Command "print-config" is deprecated, use the> Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl influxd-systemd-start.sh[1089450]: InfluxDB started Nov 04 06:03:11 wkflaspsitcdb01.idm.oam.mbnl systemd[1]: Started InfluxDB is an open-source, distributed, time series databas> lines 1-17/17 (END)

LOGS

[]# tail -f /var/log/gmd/default.log {"channel-type":"app","file":"/go/pkg/cmd/run.go:64","func":"github.com/codership/galera-manager/pkg/cmd.(RunCommand).Execute","level":"info","msg":"ConfigDir = /var/lib/gmd","time":"2023-11-04T06:24:22Z"} {"channel-type":"app","file":"/go/pkg/cmd/run.go:65","func":"github.com/codership/galera-manager/pkg/cmd.(RunCommand).Execute","level":"info","msg":"LogsDir = /var/log/gmd","time":"2023-11-04T06:24:22Z"} {"channel-type":"app","file":"/go/pkg/cmd/run.go:60","func":"github.com/codership/galera-manager/pkg/cmd.(RunCommand).Execute","level":"info","msg":"Starting gmd","time":"2023-11-04T06:24:52Z"} {"channel-type":"app","file":"/go/pkg/cmd/run.go:61","func":"github.com/codership/galera-manager/pkg/cmd.(RunCommand).Execute","level":"info","msg":"Listening on 127.0.0.1:8000","time":"2023-11-04T06:24:52Z"}

~]# ll /var/log/influxdb/ total 0

~]#tail -f /var/log/nginx/10.112.48.69.gmd.error.log 2023/11/04 06:13:54 [error] 1089521#0: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 10.112.48.69, server: 10.112.48.69, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8000/", host: "10.112.48.69"

Setup & ENV

  1. What all domains other than already listed ( list of domain whitelisted are mentioned above) needs to be whitelisted to GMD to install properly?. Do GMD requries access to to github.com/codership/galera-manager/pkg/cmd to function ?
    1. Looking after the GMD logs it seems GMD needs access to "github.com/codership/galera-manager/pkg/cmd.(*RunCommand)" but "https://github.com/codership/galera-manager/pkg/" returns 404-page not found on the browser. Is this a bug?
  2. Do let me know if you need more details from my side to assist, please.
esscz commented 8 months ago

Hello,

It appears that NGINX is encountering difficulties connecting to GMD on port 8000. Could you kindly verify if GMD is accessible on this port? A simple way to check is by using the telnet command:

telnet 127.0.0.1 8000

Thank you for your cooperation.

alokispandey commented 8 months ago

No, telnet returns connection refused error, even gmd service status is active and online.And gmd logs are also not helpful.Sent from my iPhoneOn 07-Nov-2023, at 12:13 AM, esscz @.***> wrote: Hello, It appears that NGINX is encountering difficulties connecting to GMD on port 8000. Could you kindly verify if GMD is accessible on this port? A simple way to check is by using the telnet command: telnet 127.0.0.1 8000

Thank you for your cooperation.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

esscz commented 8 months ago

Could you please verify that gmd is actively listening on port 8000 ? You can do this by executing either sudo ss -tulnp | grep 8000 or sudo netstat -tulnp | grep 8000

esscz commented 8 months ago

It's also possible that SELinux is preventing the connection. Could you please execute the following commands to check the status of SELinux and to search for any relevant AVC denials?

sestatus

and

sudo ausearch -m avc -ts recent
alokispandey commented 8 months ago

GMD Is not listening. Not sure how come service showing active. Nothing is logs even after enabling debug mode.

[root@nhtlaspsitcdb01 ~]# sestatus SELinux status: disabled [root@nhtlaspsitcdb01 ~]# ausearch -m avc -ts recent Email option is specified but /usr/lib/sendmail doesn't seem executable. q_depth should be larger than 512 for safety margin

I tried running gmd service manually using below command: [root@nhtlaspsitcdb01 ~]# /usr/bin/gmd --config-dir=/var/lib/gmd --logs-dir=/var/log/gmdv --log-format=json --log-level=debug run --bind-address=127.0.0.1:8000 --influxdb-url=http://gmd:ofrjVFL9XY@nhtlaspsitcdb01.idm.oam.mbnl:8081/gmd {"file":"/go/pkg/cmd/run.go:60","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"Starting gmd","time":"2023-11-07T08:14:47Z"} {"file":"/go/pkg/cmd/run.go:61","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"Listening on 127.0.0.1:8000","time":"2023-11-07T08:14:47Z"} {"file":"/go/pkg/cmd/run.go:64","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"ConfigDir = /var/lib/gmd","time":"2023-11-07T08:14:47Z"} {"file":"/go/pkg/cmd/run.go:65","func":"github.com/codership/galera-manager/pkg/cmd.(*RunCommand).Execute","level":"info","msg":"LogsDir = /var/log/gmdv","time":"2023-11-07T08:14:47Z"} **Get "http://checkip.amazonaws.com": dial tcp: lookup checkip.amazonaws.com: i/o timeout** [root@nhtlaspsitcdb01 ~]# I can see a timeout, could it be a reason? if yes, then how to set proxy for GMD as our internet access is behind a proxy and setting up OS level proxy will break functionality of other running application.
alokispandey commented 8 months ago

Thanks its fixed. issue seems to be http_proxy was not set. I access my proxy details in

cat /usr/lib/systemd/system/gmd.service

[Unit] Description=gmd - galera manager daemon After=network.target

[Service] EnvironmentFile=/etc/default/gmd [Service] Environment=https_proxy=http://XXXXXXX:3128 Environment=http_proxy=http://XXXXXl:3128 User=gmd Group=gmd LimitNOFILE=65536 Restart=on-failure Type=simple ExecStart=/usr/bin/gmd run $ARGS

[Install] WantedBy=default.target

]# restart gmd service and it started working.

alokispandey commented 8 months ago

now getting error while adding node in cluster: Error

failed to execute cluster config script (RunScriptWithConn)

Process exited with status 4

esscz commented 8 months ago

Could you please attach whole log of installation ? The reason will be mentioned several lines before the end of file.

alokispandey commented 8 months ago

Attached: cluster-8-host-2.log

i am trying to monitor the existing Galera-cluster and; 1: The steps to setup a repo in https://galeracluster.com/2021/02/using-galera-manager-to-monitor-your-existing-galera-clusters/ is not working. dnf install galera-4 reports error:

Error: Transaction test error: file /etc/sysconfig/garb from install of galera-4-26.4.16-1.el8.x86_64 conflicts with file from package galera-25.3.35-1.module+el8.6.0+15949+4ba4ec26.x86_64 file /usr/share/man/man8/garbd.8.gz from install of galera-4-26.4.16-1.el8.x86_64 conflicts with file from package galera-25.3.35-1.module+el8.6.0+15949+4ba4ec26.x86_64

  1. Repo usages centos, i tried to replace it with redhat, but still no luck.
alokispandey commented 8 months ago

I tried installing the "managed cluster" on a fresh node. still no luck. Attaching full logs cluster-9-host-1.log cluster-9.log

esscz commented 8 months ago

In both cases it's not possible to resolve hostname:

{"channel-type":"stdout","cluster-id":"9","file":"/go/pkg/log/iolog.go:37","func":"github.com/codership/galera-manager/pkg/log.(*IOLog).Write","host-id":"1","level":"info","msg":"failed: Name or service not known.\r\nwget: unable to resolve host address ‘downloads.mariadb.com’\r\n","ssh-host":"10.102.48.39","time":"2023-11-08T06:14:29Z"}

and

{"channel-type":"stdout","cluster-id":"8","file":"/go/pkg/log/iolog.go:37","func":"github.com/codership/galera-manager/pkg/log.(*IOLog).Write","host-id":"2","level":"info","msg":"Errors during downloading metadata for repository 'galera-manager':\r\n  - Curl error (6): Couldn't resolve host name for https://repo.galera-manager.com/nexus/repository/galera-manager-release/repodata/repomd.xml [Could not resolve host: repo.galera-manager.com]\r\n","ssh-host":"10.102.48.38","time":"2023-11-07T11:26:38Z"}

could you please check DNS settings ?

alokispandey commented 8 months ago

The system is behind proxy, the installer must have an option to set proxy otherwise must use ENV variable http_proxy. I tested it by exporting http_proxy = "http://my.proxy.com:myport" and tried to curl that URL, its working fine.

alokispandey commented 8 months ago

I managed to pass proxy to galera installer by exporting http_proxy variable in /etc/bashrc file. And I can see installer was able to make progress but it failing now on:

root@10.112.48.70# mysqladmin -u root status Nov 08, 2023 14:42:06 | stdout | mysqladmin: connect to server at 'localhost' failed Nov 08, 2023 14:42:06 | stdout | error: 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)' Nov 08, 2023 14:42:06 | stdout | Check that mysqld is running and that the socket: '/var/lib/mysql/mysql.sock' exists! Nov 08, 2023 14:42:06 | galera-manager | mysqld is apparently not running Nov 08, 2023 14:42:06 | galera-manager | setting cluster-wide Lock (to avoid race conditions with the first node) Nov 08, 2023 14:42:06 | galera-manager | starting as a first node Nov 08, 2023 14:42:06 | galera-manager | checking grastate Nov 08, 2023 14:42:06 | galera-manager | running start script Nov 08, 2023 14:42:06 | galera-manager | root@10.112.48.70# echo -n "test" Nov 08, 2023 14:42:06 | stdout | test Nov 08, 2023 14:42:06 | galera-manager | Default Galera version is 4 Nov 08, 2023 14:42:06 | galera-manager | Including custom config directory from my.cnf Nov 08, 2023 14:42:06 | galera-manager | Writing to /etc/mysql/wsrep/conf.d/99.galera.cnf Nov 08, 2023 14:42:06 | stdout | 10.112.48.70:22$ bash -c '[ -f /var/lib/mysql/grastate.dat ] && sed -i '"'"'s/safe_to_bootstrap: .*/safe_to_bootstrap: 1/'"'"' /var/lib/mysql/grastate.dat || true' Nov 08, 2023 14:42:06 | galera-manager | Will fix grastate.dat (if required) Nov 08, 2023 14:42:06 | galera-manager | Running the first node in the cluster Nov 08, 2023 14:42:06 | stdout | 10.112.48.70:22$ galera_new_cluster Nov 08, 2023 14:42:07 | stdout | Job for mariadb.service failed because the control process exited with error code. Nov 08, 2023 14:42:07 | stdout | See "systemctl status mariadb.service" and "journalctl -xe" for details. Nov 08, 2023 14:42:07 | galera-manager | Got an error and attepts = 0

FUll console logs: galera-console output.txt

Host logs: cluster-12-host-2.log

Hope the above logs may provide you more insight to what is going wrong.

alokispandey commented 8 months ago

tried on fresh redhat 8 node with mysql-8 instead of mariadb. still failing on same point.

, 2023 13:52:46 | galera-manager | checking node status Nov 09, 2023 13:52:46 | galera-manager | root@10.112.48.70# mysqladmin -u root status Nov 09, 2023 13:52:46 | stdout | mysqladmin: connect to server at 'localhost' failed Nov 09, 2023 13:52:46 | stdout | error: 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)' Nov 09, 2023 13:52:46 | stdout | Check that mysqld is running and that the socket: '/var/lib/mysql/mysql.sock' exists! Nov 09, 2023 13:52:46 | galera-manager | mysqld is apparently not running Nov 09, 2023 13:52:46 | galera-manager | setting cluster-wide Lock (to avoid race conditions with the first node) Nov 09, 2023 13:52:46 | galera-manager | starting as a first node Nov 09, 2023 13:52:46 | galera-manager | checking grastate Nov 09, 2023 13:52:46 | galera-manager | running start script Nov 09, 2023 13:52:46 | galera-manager | root@10.112.48.70# echo -n "test" Nov 09, 2023 13:52:46 | stdout | test Nov 09, 2023 13:52:46 | galera-manager | Default Galera version is 4 Nov 09, 2023 13:52:46 | galera-manager | Including custom config directory from my.cnf Nov 09, 2023 13:52:46 | galera-manager | Writing to /etc/mysql/wsrep/conf.d/99.galera.cnf Nov 09, 2023 13:52:46 | galera-manager | Will fix grastate.dat (if required) Nov 09, 2023 13:52:46 | stdout | 10.112.48.70:22$ bash -c '[ -f /var/lib/mysql/grastate.dat ] && sed -i '"'"'s/safe_to_bootstrap: ./safe_to_bootstrap: 1/'"'"' /var/lib/mysql/grastate.dat || true' Nov 09, 2023 13:52:46 | galera-manager | Running the first node in the cluster Nov 09, 2023 13:52:46 | stdout | 10.112.48.70:22$ bash -c 'systemctl set-environment MYSQLD_OPTS="--wsrep-new-cluster" && systemctl start mysqld && systemctl unset-environment MYSQLD_OPTS' Nov 09, 2023 13:52:48 | stdout | Job for mysqld.service failed because the control process exited with error code. Nov 09, 2023 13:52:48 | stdout | See "systemctl status mysqld.service" and "journalctl -xe" for details. Nov 09, 2023 13:52:48 | galera-manager | Got execution failure, will retry (attempts left 2) Nov 09, 2023 13:52:48 | galera-manager | Got an error and attepts = 3 Nov 09, 2023 13:52:53 | stdout | 10.112.48.70:22$ bash -c 'systemctl set-environment MYSQLD_OPTS="--wsrep-new-cluster" && systemctl start mysqld && systemctl unset-environment MYSQLD_OPTS' Nov 09, 2023 13:52:55 | stdout | Job for mysqld.service failed because the control process exited with error code. Nov 09, 2023 13:52:55 | stdout | See "systemctl status mysqld.service" and "journalctl -xe" for details. Nov 09, 2023 13:52:55 | galera-manager | Got an error and attepts = 2 Nov 09, 2023 13:52:55 | galera-manager | Got execution failure, will retry (attempts left 1) Nov 09, 2023 13:53:00 | stdout | 10.112.48.70:22$ bash -c 'systemctl set-environment MYSQLD_OPTS="--wsrep-new-cluster" && systemctl start mysqld && systemctl unset-environment MYSQLD_OPTS' Nov 09, 2023 13:53:03 | stdout | Job for mysqld.service failed because the control process exited with error code. Nov 09, 2023 13:53:03 | stdout | See "systemctl status mysqld.service" and "journalctl -xe" for details. Nov 09, 2023 13:53:03 | galera-manager | Got an error and attepts = 1 Nov 09, 2023 13:53:03 | galera-manager | SshHost.RunScript error: command failed (stepName=run_cluster_first, commandId=3, commandType=ExecCommand): Process exited with status 1failed to execute cluster config script (RunScriptWithConn) github.com/codership/galera-manager/pkg/internal/sshcmd.(Host).RunScriptWithConn /go/pkg/internal/sshcmd/executor.go:115 github.com/codership/galera-manager/pkg/internal/sshcmd.(Host).RunScript /go/pkg/internal/sshcmd/executor.go:171 github.com/codership/galera-manager/pkg/internal/mgmt/units.(Node).Start /go/pkg/internal/mgmt/units/node.go:483 github.com/codership/galera-manager/pkg/internal/mgmt.(Nodes).Start.func1 /go/pkg/internal/mgmt/nodes.go:180 github.com/codership/galera-manager/pkg/internal/jobs.(Processor).Execute.func1 /go/pkg/internal/jobs/processor.go:90 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1594 Nov 09, 2023 13:53:03 | galera-manager | Exit status is not 0. Database engine start failure? Nov 09, 2023 13:53:03 | galera-manager | error starting the node

ayurchen commented 8 months ago

Hi.

The key is this line:

 See "systemctl status mysqld.service" and "journalctl -xe" for details.

That means that mysqld failed to start for its own internal reasons (most likely misconfiguration) and we must see its error log (the output of the aforementioned commands may br helpful, but limited)

Please post mysqld error log (normally /var/lib/mysql/mysqld.err or /var/log/mysql/error.log)

alokispandey commented 8 months ago

HI alexey, The problem is when deploying node with galere-manager , db service is not configured properly. Installer searching for pkg which is not part of its repo. All matches were filtered out by modular filtering for argument: galera(B Nov 10, 2023 04:45:44 | stdout | Package socat-1.7.4.1-1.el8.x86_64 is already installed. Nov 10, 2023 04:45:44 | stdout | Error: Unable to find a match: galera

Available repos: [root@nhtlaspsitcdb03 ~]# dnf repolist Updating Subscription Management repositories. repo id repo name ansible-2.8-for-rhel-8-x86_64-rpms Red Hat Ansible Engine 2.8 for RHEL 8 x86_64 (RPMs) ansible-2.9-for-rhel-8-x86_64-rpms Red Hat Ansible Engine 2.9 for RHEL 8 x86_64 (RPMs) codeready-builder-for-rhel-8-x86_64-rpms Red Hat CodeReady Linux Builder for RHEL 8 x86_64 (RPMs) epel Extra Packages for Enterprise Linux 8 - x86_64 galera-manager Galera Manager influxdb InfluxDB Repository - RHEL 8 rhel-8-for-x86_64-appstream-rpms Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs) rhel-8-for-x86_64-baseos-rpms Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs) rhel-8-for-x86_64-supplementary-rpms Red Hat Enterprise Linux 8 for x86_64 - Supplementary (RPMs) [root@nhtlaspsitcdb03 ~]#

byte commented 8 months ago

HI alexey, The problem is when deploying node with galere-manager , db service is not configured properly. Installer searching for pkg which is not part of its repo. All matches were filtered out by modular filtering for argument: �[1mgalera�(B�[m Nov 10, 2023 04:45:44 | stdout | Package socat-1.7.4.1-1.el8.x86_64 is already installed. Nov 10, 2023 04:45:44 | stdout | Error: Unable to find a match: galera

Available repos: [root@nhtlaspsitcdb03 ~]# dnf repolist Updating Subscription Management repositories. repo id repo name ansible-2.8-for-rhel-8-x86_64-rpms Red Hat Ansible Engine 2.8 for RHEL 8 x86_64 (RPMs) ansible-2.9-for-rhel-8-x86_64-rpms Red Hat Ansible Engine 2.9 for RHEL 8 x86_64 (RPMs) codeready-builder-for-rhel-8-x86_64-rpms Red Hat CodeReady Linux Builder for RHEL 8 x86_64 (RPMs) epel Extra Packages for Enterprise Linux 8 - x86_64 galera-manager Galera Manager influxdb InfluxDB Repository - RHEL 8 rhel-8-for-x86_64-appstream-rpms Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs) rhel-8-for-x86_64-baseos-rpms Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs) rhel-8-for-x86_64-supplementary-rpms Red Hat Enterprise Linux 8 for x86_64 - Supplementary (RPMs) [root@nhtlaspsitcdb03 ~]#

this suggests you're missing the repository for galera, despite the galera-manager repo being there, it suggests you're not seeing galera server. can you paste contents of the galera-manager repo and for good measure influxdb repo and ensure that everything it calls can be accessed thru your firewall?

alokispandey commented 8 months ago

Hi,

I've made a fresh start on 3 vanilla redhat 8 nodes. Galera installation was successful but adding nodes is still failing

Deployment logs from Galera console:

Nov 21, 2023 10:53:01 | galera-manager | Default Galera version is 4
Nov 21, 2023 10:53:01 | galera-manager | Including custom config directory from my.cnf
Nov 21, 2023 10:53:01 | stdout         | 10.102.48.39:22$ bash -c '[ -f /var/lib/mysql/grastate.dat ] && sed -i '"'"'s/safe_to_bootstrap: .*/safe_to_bootstrap: 1/'"'"' /var/lib/mysql/grastate.dat || true'
Nov 21, 2023 10:53:01 | galera-manager | Will fix grastate.dat (if required)
Nov 21, 2023 10:53:01 | galera-manager | Running the first node in the cluster
Nov 21, 2023 10:53:01 | stdout         | 10.102.48.39:22$ galera_new_cluster
Nov 21, 2023 10:53:01 | stdout         | Job for mariadb.service failed because the control process exited with error code.
Nov 21, 2023 10:53:01 | stdout         | See "systemctl status mariadb.service" and "journalctl -xe" for details.
Nov 21, 2023 10:53:01 | galera-manager | Got an error and attepts = 0
Nov 21, 2023 10:53:01 | galera-manager | SshHost.RunScript error: command failed (stepName=run_cluster_first, commandId=3, commandType=ExecCommand): Process exited with status 1failed to execute cluster config script (RunScriptWithConn)
github.com/codership/galera-manager/pkg/internal/sshcmd.(*Host).RunScriptWithConn
    /go/pkg/internal/sshcmd/executor.go:115
github.com/codership/galera-manager/pkg/internal/sshcmd.(*Host).RunScript
    /go/pkg/internal/sshcmd/executor.go:171
github.com/codership/galera-manager/pkg/internal/mgmt/units.(*Node).Start
    /go/pkg/internal/mgmt/units/node.go:483
github.com/codership/galera-manager/pkg/internal/mgmt.(*Nodes).Start.func1
    /go/pkg/internal/mgmt/nodes.go:180
github.com/codership/galera-manager/pkg/internal/jobs.(*Processor).Execute.func1
    /go/pkg/internal/jobs/processor.go:90
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1594
Nov 21, 2023 10:53:01 | galera-manager | Exit status is not 0. Database engine start failure?
Nov 21, 2023 10:53:01 | galera-manager | error starting the node

Service status mariadb shows permission issue with telegraf


-21T10:54:17Z I! [inputs.execd] Starting process: /usr/local/bin/mysql_wsrep [-config /etc/telegraf/mysql_wsrep-telegraf-plugin.conf]
-21T10:54:17Z E! [inputs.execd] stderr: "Err loading input: open /etc/telegraf/mysql_wsrep-telegraf-plugin.conf: permission denied"
-21T10:54:17Z E! [inputs.execd] Process /usr/local/bin/mysql_wsrep exited: exit status 1
-21T10:54:17Z I! [inputs.execd] Restarting in 10s...

changed ownerhip # chown -R telegraf: /etc/telegraf and systemctl restart mariadb.service, but its still failing : systemctl status mariadb.service

● mariadb.service - MariaDB 10.6.16 database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: failed (Result: exit-code) since Tue 2023-11-21 10:55:54 GMT; 34s ago
     Docs: man:mariadbd(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 7967 ExecStart=/usr/sbin/mariadbd $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
  Process: 7838 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; >
  Process: 7836 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 7967 (code=exited, status=1/FAILURE)

Nov 21 10:55:53 nhtlaspsitcdb03.idm.oam.mbnl systemd[1]: Starting MariaDB 10.6.16 database server...
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl sh[7840]: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl mariadbd[7967]: [99B blob data]
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl mariadbd[7967]: Fatal error in defaults handling. Program aborted
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl systemd[1]: mariadb.service: Failed with result 'exit-code'.
Nov 21 10:55:54 nhtlaspsitcdb03.idm.oam.mbnl systemd[1]: Failed to start MariaDB 10.6.16 database server.

mariadb.serice file is blank:

 cat /etc/systemd/system/mariadb.service.d/migrated-from-my.cnf-settings.conf
# converted using /usr/bin/mariadb-service-convert
#

[Service]

It is a broken installer i believe?

Sharing Repository details as requested:

Galera clinet:


[root@nhtlaspsitcdb03 ~]# cat /etc/yum.repos.d/
galera-manager.repo  influxdb.repo        mariadb.repo         redhat.repo
[root@nhtlaspsitcdb03 ~]# cat /etc/yum.repos.d/{g,i,m}*.repo
[galera-manager]
name = Galera Manager
baseurl = https://repo.galera-manager.com/nexus/repository/galera-manager-release
gpgcheck = 0
[influxdb]
name = InfluxDB Repository
baseurl = https://repos.influxdata.com/rhel/8/x86_64/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdata-archive_compat.key

[mariadb-main]
name = MariaDB Server
baseurl = https://dlm.mariadb.com/repo/mariadb-server/10.6/yum/rhel/8/x86_64
gpgkey = file:///etc/pki/rpm-gpg/MariaDB-Server-GPG-KEY
gpgcheck = 1
enabled = 1
module_hotfixes = 1

Galera Server:

[root@~]# ls /etc/yum.repos.d/
epel-modular.repo  epel.repo  epel-testing-modular.repo  epel-testing.repo  galera-manager.repo  influxdb.repo  redhat.repo
[root@nhtlaspsitcdb04 ~]# cat /etc/yum.repos.d/{g,i}*.repo
[galera-manager]
name = Galera Manager
baseurl = https://repo.galera-manager.com/nexus/repository/galera-manager-release
gpgcheck = 0
[influxdb]
name = InfluxDB Repository - RHEL $releasever
baseurl = https://repos.influxdata.com/rhel/8/$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdata-archive_compat.key
alokispandey commented 7 months ago

Finally, it works

https://galeracluster.com/2023/11/galera-manager-november-2023-release-now-includes-deployment-and-monitoring-for-percona-xtradb-cluster-pxc-8-0/ "gm-installer version 1.11.0" finally works. However it was not a one-click deployment and I had to make some amendments. i am sharing details below which someone might find useful.

`### Galera-manager installer issues

fail to manage HTTP proxy Most of the system runs behind a proxy and the gm-installer fails to use the proxy defined after installation, GMD service fails to start as it needs access to codership repository to work. Ensure systemctl serivce file has proxy defined using the "Environment" parameter as shown below:

cat /usr/lib/systemd/system/gmd.service [Unit] Description=gmd - galera manager daemon After=network.target

[Service] EnvironmentFile=/etc/default/gmd User=gmd Group=gmd LimitNOFILE=65536 Restart=on-failure Type=simple ExecStart=/usr/bin/gmd run $ARGS Environment=https_proxy=http://my.env.proxy.com:proxyport Environment=http_proxy=http://my.env.proxy.com:proxyport

[Install] WantedBy=default.target

Galera-manager deployment issues

GM installer fails to use HTTP proxy defined in /etc/profile or exported in http_proxy variable

If you are performing deployment on nodes behind a proxy, ensure to export http_proxy and https_proxy in /etc/bashrc for the GM installer to use it and perform the deployment. You can perform clean-up after deployment and define a proxy in dnf repo files.

Galera-manager after deployment issues

1. GM installer fail to start services

Once GM is installed, and while performing db deployments, GM fails to start the service at the final step. Ensure below

2. Other Errors

To fix it use "alter table" :

eg: 
•   ALTER TABLE mysql.column_stats MODIFY hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB','JSON_HB');
•   ALTER TABLE mysql.column_stats MODIFY histogram longblob;

To fix it follow https://github.com/influxdata/telegraf/issues/7968

best of luck..!