Closed dalehamel closed 9 years ago
ping @vincentbernat since i've had some contact with you, and read your blog post
Oh, I thought you were posting on the mailing list.
The problem is quite odd. Could you check with strace if you see anything when you try to walk the MIB. The goal is to see if the master agent is sending requests. You should have 3 keepalived processes. The first one can be ignored. To find the VRRP process, use lsof -n -p PID
. The VRRP process has some raw socket. You should also check what is the file descriptor for the AgentX socket (this is the Unix socket). Then, with strace, see if there is some activity on this socket.
If not, there is a configuration problem with the master agent.
Tell me if I need to expand on this information.
Oh, I thought you were posting on the mailing list.
I thought i submitted it, I must have screwed that up - my apologies.
in my case, it looks like it's this process, as it has the raw socket you mentioned:
keepalive 12378 root 0u CHR 1,3 0t0 1041 /dev/null
keepalive 12378 root 1u CHR 1,3 0t0 1041 /dev/null
keepalive 12378 root 2u CHR 1,3 0t0 1041 /dev/null
keepalive 12378 root 3u unix 0xffff880096d8e580 0t0 1368064 socket
keepalive 12378 root 4r FIFO 0,8 0t0 1368572 pipe
keepalive 12378 root 5w FIFO 0,8 0t0 1368572 pipe
keepalive 12378 root 6u netlink 0t0 1368575 ROUTE
keepalive 12378 root 7u netlink 0t0 1368576 ROUTE
keepalive 12378 root 8u pack 1368577 0t0 RARP type=SOCK_RAW
keepalive 12378 root 9u pack 1368578 0t0 IPV6 type=SOCK_RAW
keepalive 12378 root 10r FIFO 0,8 0t0 1368579 pipe
keepalive 12378 root 11w FIFO 0,8 0t0 1368579 pipe
keepalive 12378 root 12r FIFO 0,8 0t0 1368580 pipe
keepalive 12378 root 13w FIFO 0,8 0t0 1368580 pipe
keepalive 12378 root 14u unix 0xffff880099613100 0t0 1368581 socket
keepalive 12378 root 15u raw 0t0 1368598 00000000:0070->00000000:0000 st=07
keepalive 12378 root 16u raw 0t0 1368599 00000000:0070->00000000:0000 st=07
Running strace (filtering out gettimeof day and select since they were spamming):
strace -p 12378 2>&1 | grep -v 'gettime\|select'
Process 12378 attached
sendmsg(16, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.29.24.50")}, msg_iov(1)=[{"E\300\0$\0K\0\0\377p2\363\254\35\26\377\254\35\0302!5\n\0\0\1\324\311\0\0\0\0"..., 36}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(16, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.29.24.50")}, msg_iov(1)=[{"E\300\0$\0L\0\0\377p2\362\254\35\26\377\254\35\0302!5\n\0\0\1\324\311\0\0\0\0"..., 36}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(16, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.29.24.50")}, msg_iov(1)=[{"E\300\0$\0M\0\0\377p2\361\254\35\26\377\254\35\0302!5\n\0\0\1\324\311\0\0\0\0"..., 36}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(16, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.29.24.50")}, msg_iov(1)=[{"E\300\0$\0N\0\0\377p2\360\254\35\26\377\254\35\0302!5\n\0\0\1\324\311\0\0\0\0"..., 36}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(16, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.29.24.50")}, msg_iov(1)=[{"E\300\0$\0O\0\0\377p2\357\254\35\26\377\254\35\0302!5\n\0\0\1\324\311\0\0\0\0"..., 36}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(16, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.29.24.50")}, msg_iov(1)=[{"E\300\0$\0P\0\0\377p2\356\254\35\26\377\254\35\0302!5\n\0\0\1\324\311\0\0\0\0"..., 36}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(16, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.29.24.50")}, msg_iov(1)=[{"E\300\0$\0Q\0\0\377p2\355\254\35\26\377\254\35\0302!5\n\0\0\1\324\311\0\0\0\0"..., 36}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(16, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.29.24.50")}, msg_iov(1)=[{"E\300\0$\0R\0\0\377p2\354\254\35\26\377\254\35\0302!5\n\0\0\1\324\311\0\0\0\0"..., 36}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(16, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.29.24.50")}, msg_iov(1)=[{"E\300\0$\0S\0\0\377p2\353\254\35\26\377\254\35\0302!5\n\0\0\1\324\311\0\0\0\0"..., 36}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(16, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.29.24.50")}, msg_iov(1)=[{"E\300\0$\0T\0\0\377p2\352\254\35\26\377\254\35\0302!5\n\0\0\1\324\311\0\0\0\0"..., 36}], msg_controllen=0, msg_flags=0}, 0) = 36
It doesn't seem anything is happening here, I ran snmpwalk numerous times as below:
snmpwalk -v2c -cpublic localhost .1.3.6.1.4.1.9586.100.5
The strace traffic seems to just be keepalived pinging its peer, seems like the snmp calls never actually make it there... : /
To make the strace less noisy, i disabled my check scripts temporarily:
vrrp_instance nat_instance {
debug 2
interface eth0
state BACKUP
virtual_router_id 53
priority 10
unicast_src_ip 172.29.22.255
unicast_peer {
172.29.24.50
}
}
Something worth noting though, select seems to be continuously timing out:
select(1024, [4 6 10 12 14 15], [], [], {0, 20160}) = 0 (Timeout)
select(1024, [4 6 10 12 14 15], [], [], {0, 999922}) = 0 (Timeout)
select(1024, [4 6 10 12 14 15], [], [], {0, 630799}) = 0 (Timeout)
select(1024, [4 6 10 12 14 15], [], [], {0, 368098}) = 0 (Timeout)
select(1024, [4 6 10 12 14 15], [], [], {0, 999965}) = 0 (Timeout)
select(1024, [4 6 10 12 14 15], [], [], {0, 999962}) = 0 (Timeout)
select(1024, [4 6 10 12 14 15], [], [], {0, 999909}) = 0 (Timeout)
select(1024, [4 6 10 12 14 15], [], [], {0, 999941}) = 0 (Timeout)
select(1024, [4 6 10 12 14 15], [], [], {0, 999960}) = 0 (Timeout)
Though i don't think any of those fds are the raw socket?
select()
timing out is expected if nothing is received (it times out every second to be able to send VRRP packets). You should get something on file descriptor 14. Could you try with this very simple configuration for the master agent?
rocommunity public
master agentx
This way, no access control, no filtering will be done.
Yup, that did it.
Thanks for your help! :heart:
I've literally been banging my head against this problem all day. @vincentbernat
I did miss your original configuration. The public
community has a limited view systemonly
. You could also add the appropriate OID to view systemonly included
.
Makes sense! Thanks for the assistance!
On Wednesday, August 26, 2015, Vincent Bernat notifications@github.com wrote:
I did miss your original configuration. The public community has a limited view systemonly. You could also add the appropriate OID to view systemonly included.
— Reply to this email directly or view it on GitHub https://github.com/acassen/keepalived/issues/186#issuecomment-135158501.
I've been trying to set up SNMP support for Keepalived to create a highly available pain of NAT nodes. Below is my configuration
Unfortunately, even though keepalived seems to register itself with the SNMP daemon according to syslog:
Or by running keepalived in the foreground:
And I can even see it in the SNMPv2-MIB::sysORDescr table:
I get nothing when i try to walk it:
Even if I try the root OID, I still get nothing:
I'm running v1.2.13 but i have the same issues when I try with 1.2.19 (only it doesn't complain about duplicate registration, as they fixed that bug). I am on Ubuntu 14.04, apparmor disabled for debugging this. I have tried numerous other versions (v1.2.6-1.2.13, i can't get 1.2.5 or earlier to compile), with the same problem.
Here is my snmpd.conf and my snmp daemon config.
I've started reading through the keepalived source code, and it looks like the SNMP support has support for sending traps on state transitions, as in vrrp_state_become_master, but I'm not interested (much) in traps, I'm more interested in polling the current state of keepalived, which it looks like is supposed to be registered in snmp_agent_init by calling snmp_register_mib.
The call to register the MIBs seem to succeed, but it can never actually any values.
I built the latest master with some debug prints, curiously it seems that 'vrrp_snmp_instance' is never being called when i try to snmpget.
From my understanding of the source code, it looks like 'vrrp_vars' in vrrp_snmp.c contains a list of the different OIDs as 'variable8' structures, which contain a function pointer for which function should be called.
The KEEPALIVED-VRRP registers vrrp_vars to snmpd by calling 'register_mib' in 'vrrp_snmp_agent_init'.
If my understanding of how this works is correct, I would assume that when snmpd receives the request, it would call the function which it has received a reference to in this struct.
However, the function are never called, as I've placed both breakpoints and prints in it.
I'm at my wits end for how to debug this further, but the underlying issue is that the MIB doesn't seem to properly register, in that it is empty, and the functions that it is supposed to call are never called.
Any help would be appreciated!