codership / galera-manager-support

Galera Manager Support Repository
8 stars 2 forks source link

No monitoring or logs in new install 1.8.4 / Ubuntu 22.04 + Mariadb 10.11.7 #100

Open LittleDuke opened 7 months ago

LittleDuke commented 7 months ago

New to Galera Manager -- new install on Ubuntu 22.04 with Mariadb Server version: 10.11.7-MariaDB-1:10.11.7+maria~ubu2204 mariadb.org binary distribution

Showing 0/3 Joined and "node's agent is offline"

CLUSTER IS WORKING

Here is the output from:

systemctl status telegraf

● telegraf.service - Telegraf Loaded: loaded (/lib/systemd/system/telegraf.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2024-03-30 16:31:46 CDT; 1h 13min ago Docs: https://github.com/influxdata/telegraf Main PID: 982 (telegraf) Tasks: 23 (limit: 38396) Memory: 188.2M CPU: 4min 22.828s CGroup: /system.slice/telegraf.service ├─ 982 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d └─1236 /usr/local/bin/mysql_wsrep -config /etc/telegraf/mysql_wsrep-telegraf-plugin.conf

Mar 30 17:45:01 galera47 telegraf[982]: 2024-03-30T22:45:01Z E! [inputs.execd] stderr: "2024/03/30 17:45:01 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'" Mar 30 17:45:02 galera47 telegraf[982]: 2024-03-30T22:45:02Z E! [inputs.execd] stderr: "2024/03/30 17:45:02 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'" Mar 30 17:45:03 galera47 telegraf[982]: 2024-03-30T22:45:03Z E! [inputs.execd] stderr: "2024/03/30 17:45:03 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'" Mar 30 17:45:04 galera47 telegraf[982]: 2024-03-30T22:45:04Z E! [inputs.execd] stderr: "2024/03/30 17:45:04 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'" Mar 30 17:45:05 galera47 telegraf[982]: 2024-03-30T22:45:05Z E! [inputs.execd] stderr: "2024/03/30 17:45:05 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'" Mar 30 17:45:06 galera47 telegraf[982]: 2024-03-30T22:45:06Z E! [inputs.execd] stderr: "2024/03/30 17:45:06 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'" Mar 30 17:45:07 galera47 telegraf[982]: 2024-03-30T22:45:07Z E! [inputs.execd] stderr: "2024/03/30 17:45:07 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'" Mar 30 17:45:08 galera47 telegraf[982]: 2024-03-30T22:45:08Z E! [inputs.execd] stderr: "2024/03/30 17:45:08 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'" Mar 30 17:45:09 galera47 telegraf[982]: 2024-03-30T22:45:09Z E! [inputs.execd] stderr: "2024/03/30 17:45:09 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'" Mar 30 17:45:10 galera47 telegraf[982]: 2024-03-30T22:45:10Z E! [inputs.execd] stderr: "2024/03/30 17:45:10 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'"

LittleDuke commented 7 months ago

Here are some entries from syslog:

Mar 30 17:59:09 galera47 gma[967]: time="2024-03-30T17:59:09-05:00" level=info msg="Connecting (used scheme http)..." Mar 30 17:59:09 galera47 gma[967]: time="2024-03-30T17:59:09-05:00" level=info msg="Creating agentcom stream..." Mar 30 17:59:09 galera47 gma[967]: time="2024-03-30T17:59:09-05:00" level=error msg="Failed to create stream: rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: http2: frame too large\"" Mar 30 17:59:09 galera47 gma[967]: time="2024-03-30T17:59:09-05:00" level=error msg="Error while serving: rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: http2: frame too large\"; sleeping for 1 second" Mar 30 17:59:10 galera47 telegraf[982]: 2024-03-30T22:59:10Z E! [inputs.execd] stderr: "2024/03/30 17:59:10 E! Error in plugin: Error 1054: Unknown column 'status' in 'where clause'"

LittleDuke commented 7 months ago

Seems to be related to this:

https://github.com/influxdata/telegraf/pull/10486

byte commented 7 months ago

@LittleDuke I just deployed via AWS a 3-node Galera Cluster with MariaDB and it all "just works".

lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy

MariaDB [(none)]> select version(); +-----------------------------------------+ | version() | +-----------------------------------------+ | 10.11.7-MariaDB-1:10.11.7+maria~ubu2204 | +-----------------------------------------+ 1 row in set (0.000 sec)

While true, I do see: systemctl status telegraf

● telegraf.service - Telegraf Loaded: loaded (/lib/systemd/system/telegraf.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2024-04-11 10:47:04 UTC; 3h 56min ago Docs: https://github.com/influxdata/telegraf Main PID: 5732 (telegraf) Tasks: 19 (limit: 4666) Memory: 67.7M CPU: 3min 48.290s CGroup: /system.slice/telegraf.service ├─5732 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegra> └─5741 /usr/local/bin/mysql_wsrep -config /etc/telegraf/mysql_wsrep-telegraf-plugin.conf

Apr 11 14:43:50 ip-172-31-29-243 telegraf[5732]: 2024-04-11T14:43:50Z E! [inputs.execd] stderr: "2024/04/11 14:43> Apr 11 14:43:51 ip-172-31-29-243 telegraf[5732]: 2024-04-11T14:43:51Z E! [inputs.execd] stderr: "2024/04/11 14:43> Apr 11 14:43:52 ip-172-31-29-243 telegraf[5732]: 2024-04-11T14:43:52Z E! [inputs.execd] stderr: "2024/04/11 14:43> Apr 11 14:43:53 ip-172-31-29-243 telegraf[5732]: 2024-04-11T14:43:53Z E! [inputs.execd] stderr: "2024/04/11 14:43>

Screenshot 2024-04-11 at 22 46 18
byte commented 7 months ago

Self-installed, providing own hosts, also, successful

Screenshot 2024-04-11 at 22 58 06
LittleDuke commented 7 months ago

@byte Not as helpful as you might think saying "it works here so I don't know what your problem is" ;-)

Is the "Live Monitoring : OFFLINE" helpful in diagnosing the problem?

Screen Shot 2024-04-11 at 11 14 23
byte commented 7 months ago

@byte Not as helpful as you might think saying "it works here so I don't know what your problem is" ;-)

Is the "Live Monitoring : OFFLINE" helpful in diagnosing the problem?

Screen Shot 2024-04-11 at 11 14 23

believe me, "works for me" isn't meant to tell you otherwise - so i am trying to be helpful on this free forum @LittleDuke

i have in the past seen live monitoring offline, but it has been quite sometime since this has happened. thank you for the screenshot

LittleDuke commented 7 months ago

@byte I appreciate the consideration knowing full well that I'm getting what I'm paying for with FOSS -- it's "free like a puppy" :-) -- I'm not at all opposed to paying for licenses and support of course and have asked for quotes from both Codership and Mariadb. I get that the system is wildly complex and the installer operates like PFM -- just reporting what I'm seeing or in this case what I'm NOT seeing in case someone else is having the same issue...

LittleDuke commented 7 months ago

UPDATE: The system doesn't seem like having SSL terminated in front of it, at least not using a simple HAPROXY setup that forwards traffic to port 80.

I was able to get monitoring working by reinstalling GM and enabling the Lets Encrypt certbot and then routing traffic to it directly.

byte commented 7 months ago

UPDATE: The system doesn't seem like having SSL terminated in front of it, at least not using a simple HAPROXY setup that forwards traffic to port 80.

I was able to get monitoring working by reinstalling GM and enabling the Lets Encrypt certbot and then routing traffic to it directly.

Hi there, thank you for pointing this out. So you now have a full SSL setup, as long as there was no proxy in front of it.

This is very true of Galera Manager - it needs to be setup only within its parameters, and I'm going to keep this open, because the developers should really fix it.

Thanks again.

And good work on getting quotes from Codership and MariaDB! Best way to support free software! Thank you.