ConSol-Monitoring / snclient

SNClient+ - Cross platform monitoring agent
MIT License
48 stars 9 forks source link

Update task_check_system.go #60

Closed jframeau closed 9 months ago

jframeau commented 9 months ago

Do not delete interface but old values only using useful Trim function.

Otherwise got nil pointer dereference.

runtime/debug/stack.go:24 +0x5e

pkg/snclient.(Agent).logPanicRecover(0x0?) pkg/snclient@v0.0.0-00010101000000-000000000000/snclient.go:673 +0x168 panic({0xb3cc40?, 0x11b5da0?}) runtime/panic.go:914 +0x21f pkg/snclient.(Counter).GetLast(...) pkg/snclient@v0.0.0-00010101000000-000000000000/counter.go:91 pkg/snclient.(CheckNetwork).Check(0xc0001ae800, {0xc0000e4930?, 0x1?}, 0x1?, 0xc000014240, {0x1?, 0x1?, 0xc0002bc200?}) pkg/snclient@v0.0.0-00010101000000-000000000000/check_network.go:107 +0xdb3 pkg/snclient.(Agent).runCheck(0x50fb0d?, {0xd1c098, 0xc0000e4930}, {0xc0001a94c0, 0xd}, {0xc0002bc1f0, 0x1, 0x1}) pkg/snclient@v0.0.0-00010101000000-000000000000/snclient.go:727 +0x375 pkg/snclient.(Agent).RunCheckWithContext(0xd1bf48?, {0xd1c098?, 0xc0000e4930?}, {0xc0001a94c0?, 0xc0002bc1f0?}, {0xc0002bc1f0?, 0x0?, 0xc0003fbcb0?}) pkg/snclient@v0.0.0-00010101000000-000000000000/snclient.go:688 +0x2a pkg/snclient.(Agent).RunCheck(0xc00019ab70?, {0xc0001a94c0, 0xd}, {0xc0002bc1f0, 0x1, 0x1}) pkg/snclient@v0.0.0-00010101000000-000000000000/snclient.go:683 +0xad pkg/snclient.(HandlerNRPE).ServeTCP(0xc000196d08, 0x8589ca442?, {0xd20698?, 0xc000157880}) pkg/snclient@v0.0.0-00010101000000-000000000000/listen_nrpe.go:116 +0x5f4 pkg/snclient.(Listener).handleTCPCon(0xc0001b0120, {0xd20698, 0xc000157880}, {0x7f12471f8830, 0xc000196d08}) pkg/snclient@v0.0.0-00010101000000-000000000000/listener.go:336 +0x322 pkg/snclient.(Listener).startListenerTCP.func1({0xd20698?, 0xc000157880?}) pkg/snclient@v0.0.0-00010101000000-000000000000/listener.go:316 +0x6b created by pkg/snclient.(Listener).startListenerTCP in goroutine 36 pkg/snclient@v0.0.0-00010101000000-000000000000/listener.go:312 +0x2b6 [2023-11-26 17:12:22.461][Error][pid:538522][snclient:674] *****

sni commented 9 months ago

thanks for looking into this, i have a fix already but could not push it because of another thing. Will do that later.

jframeau commented 9 months ago

ok, thx, I'll wait then for your code.

sni commented 9 months ago

i'd say it should be fixed with: c6e4304306b81cd2091657516253341a081f3f6b

jframeau commented 9 months ago

checkout this morning, no more panics but all in/out bandwidths have 0 values:

OK: wlp0s20f3 >0 B/s <0 B/s |'wlp0s20f3_traffic_in'=26685123042c 'wlp0s20f3_traffic_out'=1539736726c OK: wlp0s20f3 >0 B/s <0 B/s |'wlp0s20f3_traffic_in'=26685213532c 'wlp0s20f3_traffic_out'=1539776710c OK: wlp0s20f3 >0 B/s <0 B/s |'wlp0s20f3_traffic_in'=26685297287c 'wlp0s20f3_traffic_out'=1539822498c OK: wlp0s20f3 >0 B/s <0 B/s |'wlp0s20f3_traffic_in'=26685375250c 'wlp0s20f3_traffic_out'=1539865861c OK: wlp0s20f3 >0 B/s <0 B/s |'wlp0s20f3_traffic_in'=26685479370c 'wlp0s20f3_traffic_out'=1539924722c

jfr

Le dim. 26 nov. 2023 à 20:06, Sven Nierlein @.***> a écrit :

i'd say it should be fixed with: c6e4304 https://github.com/ConSol-Monitoring/snclient/commit/c6e4304306b81cd2091657516253341a081f3f6b

— Reply to this email directly, view it on GitHub https://github.com/ConSol-Monitoring/snclient/pull/60#issuecomment-1826870531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADZTQ27PN7BVGFAJVPPMNLYGOHKBAVCNFSM6AAAAAA726S6USVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWHA3TANJTGE . You are receiving this because you modified the open/close state.Message ID: @.***>

jframeau commented 9 months ago

hmm, re applying my patch on task_check_system.go does the job :-)

Le lun. 27 nov. 2023 à 10:04, Jean-François Rameau @.***> a écrit :

checkout this morning, no more panics but all in/out bandwidths have 0 values:

OK: wlp0s20f3 >0 B/s <0 B/s |'wlp0s20f3_traffic_in'=26685123042c 'wlp0s20f3_traffic_out'=1539736726c OK: wlp0s20f3 >0 B/s <0 B/s |'wlp0s20f3_traffic_in'=26685213532c 'wlp0s20f3_traffic_out'=1539776710c OK: wlp0s20f3 >0 B/s <0 B/s |'wlp0s20f3_traffic_in'=26685297287c 'wlp0s20f3_traffic_out'=1539822498c OK: wlp0s20f3 >0 B/s <0 B/s |'wlp0s20f3_traffic_in'=26685375250c 'wlp0s20f3_traffic_out'=1539865861c OK: wlp0s20f3 >0 B/s <0 B/s |'wlp0s20f3_traffic_in'=26685479370c 'wlp0s20f3_traffic_out'=1539924722c

jfr

Le dim. 26 nov. 2023 à 20:06, Sven Nierlein @.***> a écrit :

i'd say it should be fixed with: c6e4304 https://github.com/ConSol-Monitoring/snclient/commit/c6e4304306b81cd2091657516253341a081f3f6b

— Reply to this email directly, view it on GitHub https://github.com/ConSol-Monitoring/snclient/pull/60#issuecomment-1826870531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADZTQ27PN7BVGFAJVPPMNLYGOHKBAVCNFSM6AAAAAA726S6USVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWHA3TANJTGE . You are receiving this because you modified the open/close state.Message ID: @.***>

sni commented 9 months ago

weird.. Well, your patch simply removes the cleanup, but interfaces need to be cleaned up because docker and such tools create one-time throw away interfaces all the time. I'll have a look.

jframeau commented 9 months ago

ah ... didn't think of these ephemeral interfaces.

I'll have a look too on a server with docker running on.

Le lun. 27 nov. 2023 à 10:19, Sven Nierlein @.***> a écrit :

weird.. Well, your patch simply removes the cleanup, but interfaces need to be cleaned up because docker and such tools create one-time throw away interfaces all the time. I'll have a look.

— Reply to this email directly, view it on GitHub https://github.com/ConSol-Monitoring/snclient/pull/60#issuecomment-1827441915, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADZTQ52X736DPTJYYJD3PDYGRLKPAVCNFSM6AAAAAA726S6USVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRXGQ2DCOJRGU . You are receiving this because you modified the open/close state.Message ID: @.***>

jframeau commented 9 months ago

A little better:

// clean old values and ephemeral interfaces
for _, key := range c.snc.Counter.Keys("net") {
    c.snc.Counter.Get("net", key).Trim()
    if l := c.snc.Counter.Get("net", key).data.Len(); l == 0 {
        c.snc.Counter.Delete("net", key)
    }
}

But don't like the direct access to data attribute of CounterSet class.

Many tests with one-time interface, bandwidth computation is ok.

Le lun. 27 nov. 2023 à 10:45, Jean-François Rameau @.***> a écrit :

ah ... didn't think of these ephemeral interfaces.

I'll have a look too on a server with docker running on.

Le lun. 27 nov. 2023 à 10:19, Sven Nierlein @.***> a écrit :

weird.. Well, your patch simply removes the cleanup, but interfaces need to be cleaned up because docker and such tools create one-time throw away interfaces all the time. I'll have a look.

— Reply to this email directly, view it on GitHub https://github.com/ConSol-Monitoring/snclient/pull/60#issuecomment-1827441915, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADZTQ52X736DPTJYYJD3PDYGRLKPAVCNFSM6AAAAAA726S6USVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRXGQ2DCOJRGU . You are receiving this because you modified the open/close state.Message ID: @.***>

sni commented 9 months ago

found it, interfaces had been removed all the time, instead of when the last value is older than one hour. Should be fine now.

jframeau commented 9 months ago

Better, yes :-)

But I don't understand why the "for _, key := range c.snc.Counter.Keys("net") {" ..." is inside the more global loop "for key, val := range netdata {" ?

The evictor loop should be run once per udpate, not for each interface found in netdata. Or am I missing something ?

Le lun. 27 nov. 2023 à 20:00, Sven Nierlein @.***> a écrit :

found it, interfaces had been removed all the time, instead of when the last value is older than one hour. Should be fine now.

— Reply to this email directly, view it on GitHub https://github.com/ConSol-Monitoring/snclient/pull/60#issuecomment-1828439753, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADZTQZOBO7AHL6L4AW3ADLYGTPLNAVCNFSM6AAAAAA726S6USVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYGQZTSNZVGM . You are receiving this because you modified the open/close state.Message ID: @.***>

sni commented 9 months ago

But I don't understand why the "for _, key := range

oh thanks, good catch. It doesn't has to be. I moved it outside that loop with 7707f3fbd1bf04334aa04e23c7a65740918eb628

jframeau commented 9 months ago

Nice, works like a charm now !

thx

Le mar. 28 nov. 2023 à 09:50, Sven Nierlein @.***> a écrit :

But I don't understand why the "for _, key := range

oh thanks, good catch. It doesn't has to be. I moved it outside that loop with 7707f3f https://github.com/ConSol-Monitoring/snclient/commit/7707f3fbd1bf04334aa04e23c7a65740918eb628

— Reply to this email directly, view it on GitHub https://github.com/ConSol-Monitoring/snclient/pull/60#issuecomment-1829359306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADZTQ5SBHDMSKAMXYA4GILYGWQVHAVCNFSM6AAAAAA726S6USVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRZGM2TSMZQGY . You are receiving this because you modified the open/close state.Message ID: @.***>