NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
177 stars 95 forks source link

Solaris11 ( sparc ) agent crash after some time. #859

Closed Pigi-102 closed 5 months ago

Pigi-102 commented 2 years ago

Hi,
we are trying to use ncpa on Solaris 11.4 on sparc but after starting it it runs fine for an hour or less and then in crash.

I've "trussed" it and can see that whenit crash this is the output from the truss:

7251/1:         write(11, "160303\0 B02\0\0 >0303E7".., 1342)   = 1342
7251/1:         read(11, 0x10139C273, 5)                        Err#11 EAGAIN
7251/1:         port_associate(7, 4, 0x0000000A, 0x00000001, 0x00000000) = 0
7251/1:         port_associate(7, 4, 0x0000000B, 0x00000001, 0x00000000) = 0
7251/1:         port_getn(7, 0x1012A4C00, 64, 1, 0xFFFFFFFF7FFFC1D0) = 1 [0]
7251/1:         read(11, "160303\0 F", 5)                       = 5
7251/1:         read(11, "10\0\0 B A04 aE8 : D oA3".., 70)      = 70
7251/1:         getpid()                                        = 7251 [7208]
7251/1:         getpid()                                        = 7251 [7208]
7251/1:         read(11, "140303\001", 5)                       = 5
7251/1:         read(11, "01", 1)                               = 1
7251/1:         read(11, "160303\0 (", 5)                       = 5
7251/1:         read(11, " ] S SE0\bAA > 106DA 2AA".., 40)      = 40
7251/1:         getpid()                                        = 7251 [7208]
7251/1:         getpid()                                        = 7251 [7208]
7251/1:         write(11, "160303\0BA04\0\0B6\0\01C".., 242)    = 242
7251/1:         read(11, 0x10139C273, 5)                        Err#11 EAGAIN
7251/1:             Incurred fault #5, FLTACCESS  %pc = 0xFFFFFFFF649039A0
7251/1:               siginfo: SIGBUS BUS_ADRALN addr=0xFFFFFFFF649039A0
7251/1:             Received signal #10, SIGBUS [caught]
7251/1:               siginfo: SIGBUS BUS_ADRALN addr=0xFFFFFFFF649039A0
7251/1:         lwp_sigmask(SIG_SETMASK, 0x00000200, 0x00000000, 0x00000000, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
7251/1:         sigaction(SIGBUS, 0xFFFFFFFF7FFFAEF8, 0x00000000) = 0
7251/1:         setcontext(0xFFFFFFFF7FFFAB60)
7251/1:             Incurred fault #5, FLTACCESS  %pc = 0xFFFFFFFF649039A0
7251/1:               siginfo: SIGBUS BUS_ADRALN addr=0xFFFFFFFF649039A0
7251/1:             Received signal #10, SIGBUS [default]
7251/1:               siginfo: SIGBUS BUS_ADRALN addr=0xFFFFFFFF649039A0

Any idea on what to look ?

Thanks

Pierluigi

jomann09 commented 2 years ago

Hello, did you ever get this working? Unfortunately I'm not very familiar with Solaris, so I don't really have many ideas as to why it'd be crashing - generally if there's a random crash it'd be because it's unable to handle something like a drive disappearing or something changing that it doesn't expect.

topinet commented 1 year ago

I can confirm this issue on 2 SPARC S7-2L running Solaris 11.4

One fails with same trace above, while another one fails with:

/1: read(22, 0x10154B904, 8192)         = 0
/1:     Incurred fault #6, FLTBOUNDS  %pc = 0xFFFFFFFF79503974
/1:       siginfo: SIGSEGV SEGV_MAPERR addr=0x00000010
/1:     Received signal #11, SIGSEGV [caught]
/1:       siginfo: SIGSEGV SEGV_MAPERR addr=0x00000010
/1: lwp_sigmask(SIG_SETMASK, 0x00000400, 0x00000000, 0x00000000, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
/1: sigaction(SIGSEGV, 0xFFFFFFFF7FFF5E68, 0x00000000) = 0
/1: setcontext(0xFFFFFFFF7FFF5AD0)
/1:     Incurred fault #6, FLTBOUNDS  %pc = 0xFFFFFFFF79503974
/1:       siginfo: SIGSEGV SEGV_MAPERR addr=0x00000010
/1:     Received signal #11, SIGSEGV [default]
/1:       siginfo: SIGSEGV SEGV_MAPERR addr=0x00000010
topinet commented 1 year ago

One week after downgrading NCPA to version 2.3.1, it has not crashed.

Seems to be a problem with version 2.4.0.

ne-bbahn commented 5 months ago

This is a duplicate of a closed issue #963