ganglia / monitor-core

Ganglia Monitoring core
BSD 3-Clause "New" or "Revised" License
490 stars 246 forks source link

Anybody built a working RPM for CentOS 8? #314

Open mathog opened 4 years ago

mathog commented 4 years ago

Has anybody successfully built ganglia RPMs for CentOS 8? There was a version in EPEL for CentOS 7 but they do not have one for 8.

I used the method shown below and gmond appears to be collecting data (when run with -d 20 -f, it even lists the node name) but it won't share it. It lists requests as they happen, and returns a data structure. However, gstat shows 0 for all hosts fields and says there is no gexec. (Yes, gmond.conf was set to gexec = yes). telnet to the port shows "!ELEMENT HOSTS EMPTY" instead of hosts information. There is only the one node in this "cluster", perhaps that is the issue?

Built and tested like so:

wget https://download.fedoraproject.org/pub/fedora/linux/development/rawhide/Everything/source/tree/Packages/g/ganglia-3.7.2-30.fc32.src.rpm
dnf install perl-Pod-Html-1.22.02-416.el8.noarch
dnf install rsync rrdtool-devel rpcgen libtirpc-devel libmemcached-devel libconfuse-devel 
wget https://download.fedoraproject.org/pub/fedora/linux/development/rawhide/Everything/source/tree/Packages/l/libart_lgpl-2.3.21-23.fc32.src.rpm
rpmbuild --rebuild libart_lgpl-2.3.21-23.fc32.src.rpm
dnf install /root/rpmbuild/RPMS/x86_64/libart_lgpl-2.3.21-23.el8.x86_64.rpm /root/rpmbuild/RPMS/x86_64/libart_lgpl-devel-2.3.21-23.el8.x86_64.rpm
dnf install git
rpmbuild --rebuild ganglia-3.7.2-30.fc32.src.rpm
dnf install /root/rpmbuild/RPMS/x86_64/ganglia-3.7.2-30.el8.x86_64.rpm /root/rpmbuild/RPMS/x86_64/ganglia-gmetad-3.7.2-30.el8.x86_64.rpm /root/rpmbuild/RPMS/x86_64/ganglia-gmond-3.7.2-30.el8.x86_64.rpm
#modify /etc/ganglia/gmond.conf to: gexec = yes
#use appropriate network device for this roue
cat >/etc/sysconfig/network-scripts/route-eno1 <<EOD
239.2.11.71 via 0.0.0.0 dev eno1
EOD
systemctl enable gmond.service
systemctl start gmond.service
#system rebooted
route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.1     0.0.0.0         UG    100    0        0 eno1
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 eno1
239.2.11.71     0.0.0.0         255.255.255.255 UH    100    0        0 eno1
ps -ef | grep ganglia
#ganglia   3198     1  0 15:24 ?        00:00:00 /usr/sbin/gmetad -d 1
#let gmond run for several minutes.
gstat
CLUSTER INFORMATION
       Name: unspecified
      Hosts: 0
Gexec Hosts: 0
 Dead Hosts: 0
  Localtime: Thu Mar 26 17:19:18 2020

There are no hosts running gexec at this time
telnet 192.168.0.121 8649
Trying 192.168.0.121...
Connected to 192.168.0.121.
Escape character is '^]'.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
   <!ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)*>
      <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED>
      <!ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED>
   <!ELEMENT GRID (CLUSTER | GRID | HOSTS | METRICS)*>
      <!ATTLIST GRID NAME CDATA #REQUIRED>
      <!ATTLIST GRID AUTHORITY CDATA #REQUIRED>
      <!ATTLIST GRID LOCALTIME CDATA #IMPLIED>
   <!ELEMENT CLUSTER (HOST | HOSTS | METRICS)*>
      <!ATTLIST CLUSTER NAME CDATA #REQUIRED>
      <!ATTLIST CLUSTER OWNER CDATA #IMPLIED>
      <!ATTLIST CLUSTER LATLONG CDATA #IMPLIED>
      <!ATTLIST CLUSTER URL CDATA #IMPLIED>
      <!ATTLIST CLUSTER LOCALTIME CDATA #REQUIRED>
   <!ELEMENT HOST (METRIC)*>
      <!ATTLIST HOST NAME CDATA #REQUIRED>
      <!ATTLIST HOST IP CDATA #REQUIRED>
      <!ATTLIST HOST LOCATION CDATA #IMPLIED>
      <!ATTLIST HOST TAGS CDATA #IMPLIED>
      <!ATTLIST HOST REPORTED CDATA #REQUIRED>
      <!ATTLIST HOST TN CDATA #IMPLIED>
      <!ATTLIST HOST TMAX CDATA #IMPLIED>
      <!ATTLIST HOST DMAX CDATA #IMPLIED>
      <!ATTLIST HOST GMOND_STARTED CDATA #IMPLIED>
   <!ELEMENT METRIC (EXTRA_DATA*)>
      <!ATTLIST METRIC NAME CDATA #REQUIRED>
      <!ATTLIST METRIC VAL CDATA #REQUIRED>
      <!ATTLIST METRIC TYPE (string | int8 | uint8 | int16 | uint16 | int32 | uint32 | float | double | timestamp) #REQUIRED>
      <!ATTLIST METRIC UNITS CDATA #IMPLIED>
      <!ATTLIST METRIC TN CDATA #IMPLIED>
      <!ATTLIST METRIC TMAX CDATA #IMPLIED>
      <!ATTLIST METRIC DMAX CDATA #IMPLIED>
      <!ATTLIST METRIC SLOPE (zero | positive | negative | both | unspecified) #IMPLIED>
      <!ATTLIST METRIC SOURCE (gmond) 'gmond'>
   <!ELEMENT EXTRA_DATA (EXTRA_ELEMENT*)>
   <!ELEMENT EXTRA_ELEMENT EMPTY>
      <!ATTLIST EXTRA_ELEMENT NAME CDATA #REQUIRED>
      <!ATTLIST EXTRA_ELEMENT VAL CDATA #REQUIRED>
   <!ELEMENT HOSTS EMPTY>
      <!ATTLIST HOSTS UP CDATA #REQUIRED>
      <!ATTLIST HOSTS DOWN CDATA #REQUIRED>
      <!ATTLIST HOSTS SOURCE (gmond | gmetad) #REQUIRED>
   <!ELEMENT METRICS (EXTRA_DATA*)>
      <!ATTLIST METRICS NAME CDATA #REQUIRED>
      <!ATTLIST METRICS SUM CDATA #REQUIRED>
      <!ATTLIST METRICS NUM CDATA #REQUIRED>
      <!ATTLIST METRICS TYPE (string | int8 | uint8 | int16 | uint16 | int32 | uint32 | float | double | timestamp) #REQUIRED>
      <!ATTLIST METRICS UNITS CDATA #IMPLIED>
      <!ATTLIST METRICS SLOPE (zero | positive | negative | both | unspecified) #IMPLIED>
      <!ATTLIST METRICS SOURCE (gmond) 'gmond'>
]>
<GANGLIA_XML VERSION="3.7.2" SOURCE="gmond">
<CLUSTER NAME="unspecified" LOCALTIME="1585268298" OWNER="unspecified" LATLONG="unspecified" URL="unspecified">
</CLUSTER>
</GANGLIA_XML>
Connection closed by foreign host.

More info. A laptop running Ubuntu 18.04 LTS is on the same network. Started ganglia on it with:


sudo apt-get install ganglia-monitor rrdtool gmetad
!installed  version 3.6.0-7
!modify /etc/ganglia/gmond.conf to "gexec=yes"
!stop and restart gmond on the laptop
gstat -a
!or on original node
gstat -i 192.168.0.185 -a
CLUSTER INFORMATION
       Name: unspecified
      Hosts: 2
Gexec Hosts: 2
 Dead Hosts: 0
  Localtime: Mon Mar 30 14:25:15 2020

CLUSTER HOSTS
Hostname                     LOAD                       CPU              Gexec
 CPUs (Procs/Total) [     1,     5, 15min] [  User,  Nice, System, Idle, Wio]

192.168.0.121
    4 (    2/  433) [  0.02,  0.02,  0.03] [   2.3,   0.0,   0.5,  97.2,   0.0] ON
porthog
    2 (    0/  310) [  0.14,  0.08,  0.10] [   0.2,   0.0,   1.5,  98.0,   0.2] ON

!but query gmond on "poweredge" from self and see:
CLUSTER INFORMATION
       Name: unspecified
      Hosts: 0
Gexec Hosts: 0
 Dead Hosts: 0
  Localtime: Mon Mar 30 14:26:14 2020

There are no hosts up at this time

!or from laptop with
gstat -i 192.168.0.121 -a
Unable to get hostlist from 192.168.0.121 8649!

So apparently the 3.7.2 gmond can send but not receive information. There are no firewalls running and SELINUX makes no difference - gstat on 3.6.0 and 3.7.2 show the same information (more or less) when they query the same gmond server.

mathog commented 4 years ago

Solved it.

The problem was that "firewalld" was running and its configuration was not in any way visible with "iptables --list". The RPMs built from the Fedora src.rpms did not set rules for firewalld which would let them work (accept connections). Without those rules gmond could still talk to gmond on other sysetms.

I only figured this out because there was for some reason a firewalld rule for "cockpit" so that nmap from another machine showed port 9090 even though lsof on the CentOS 8 machine had nothing on that port. cockpit was not running or installed, and never had been.

So add the appropriate ports to the firewalld configuration. (Forget about iptables, since it is broken/not available on CentOS 8. iptables --list still runs but it does not actually show the existing rules!)