cosmonic-labs / netreap

A Cilium controller implementation for Nomad
https://netreap.io
Apache License 2.0
130 stars 8 forks source link

[BUG] panic: runtime error: invalid memory address or nil pointer dereference #38

Closed iamredbull closed 9 months ago

iamredbull commented 9 months ago

Nomad: 1.5.6 Consul: v1.14.7 Netreap: 0.2.0 Cilium: v1.14.5

Hello, I have this error, what could be the reason? The node also contains nomad-job that use Consul Connect Netreap installed on ~250 hosts, all allocations netreap sometimes get this error and restart, on average it takes 2 hours: image

Full debug log error netreap-job:

2024-02-07T11:56:58.231Z    DEBUG   netreap/main.go:120 Starting node reaper
2024-02-07T11:56:58.231Z    DEBUG   reapers/nodes.go:107    Beginning reconciliation
2024-02-07T11:56:58.231Z    DEBUG   reapers/nodes.go:108    Getting nomad node list
2024-02-07T11:56:58.420Z    DEBUG   reapers/nodes.go:119    Finished constructing list of all nodes {"nodes": {all my hosts...}
2024-02-07T11:56:58.420Z    DEBUG   reapers/nodes.go:121    Fetching cilium nodes from consul
2024-02-07T11:56:58.547Z    DEBUG   netreap/main.go:131 Starting endpoint reaper
2024-02-07T11:56:58.547Z    DEBUG   reapers/endpoints.go:117    Starting reconciliation
2024-02-07T11:56:58.549Z    DEBUG   reapers/endpoints.go:125    checking each endpoint  {"endpoints-total": 2}
2024-02-07T11:56:58.549Z    DEBUG   reapers/endpoints.go:132    Skipping endpoint that is not associated with a container   {"endpoint-id": 803}
2024-02-07T11:56:58.549Z    DEBUG   reapers/endpoints.go:132    Skipping endpoint that is not associated with a container   {"endpoint-id": 136}
2024-02-07T11:56:58.549Z    DEBUG   reapers/endpoints.go:166    Finished reconciliation
2024-02-07T11:56:58.579Z    INFO    reapers/nodes.go:56 Waiting for leader election
2024-02-07T11:56:58.636Z    DEBUG   netreap/main.go:142 starting policy poller
2024-02-07T11:56:58.636Z    INFO    policy_poller   policy/policy.go:41 starting Consul watch for key: netreap.io/policy
2024-02-07T11:56:58.637Z    DEBUG   reapers/endpoints.go:98 Got events from Allocation topic. Handling...   {"event-count": 1}
2024-02-07T11:56:58.638Z    DEBUG   reapers/endpoints.go:201    Allocation has no IP address, ignoring  {"event-type": "AllocationUpdated", "event-index": 5083350, "container-id": "374c32a3-f85e-0ed6-2c83-6748b0a41d2a"}
2024-02-07T11:56:58.663Z    INFO    policy_poller   policy/policy.go:98 loaded new policy
2024-02-07T11:57:02.684Z    DEBUG   reapers/endpoints.go:98 Got events from Allocation topic. Handling...   {"event-count": 1}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x116a45f]

goroutine 46 [running]:
github.com/cosmonic/netreap/reapers.(*EndpointReaper).handleAllocationUpdated(0xc0003ddcc0, {{0xc00050b4c0, 0xa}, {0xc0003d4eb8, 0x11}, {0xc000a926f0, 0x24}, {0xc0003dc600, 0x2, 0x4}, ...})
    /netreap/reapers/endpoints.go:225 +0x6ff
created by github.com/cosmonic/netreap/reapers.(*EndpointReaper).Run.func1
    /netreap/reapers/endpoints.go:103 +0x8c5

Another netreap job that get this error:

2024-02-07T12:17:59.471Z    DEBUG   netreap/main.go:120 Starting node reaper
2024-02-07T12:17:59.471Z    DEBUG   reapers/nodes.go:107    Beginning reconciliation
2024-02-07T12:17:59.471Z    DEBUG   reapers/nodes.go:108    Getting nomad node list
2024-02-07T12:18:00.016Z    DEBUG   reapers/nodes.go:119    Finished constructing list of all nodes {"nodes": {all my hosts...}
2024-02-07T12:18:00.017Z    DEBUG   reapers/nodes.go:121    Fetching cilium nodes from consul
2024-02-07T12:18:00.167Z    DEBUG   netreap/main.go:131 Starting endpoint reaper
2024-02-07T12:18:00.167Z    DEBUG   reapers/endpoints.go:117    Starting reconciliation
2024-02-07T12:18:00.169Z    DEBUG   reapers/endpoints.go:125    checking each endpoint  {"endpoints-total": 2}
2024-02-07T12:18:00.169Z    DEBUG   reapers/endpoints.go:132    Skipping endpoint that is not associated with a container   {"endpoint-id": 803}
2024-02-07T12:18:00.169Z    DEBUG   reapers/endpoints.go:132    Skipping endpoint that is not associated with a container   {"endpoint-id": 136}
2024-02-07T12:18:00.169Z    DEBUG   reapers/endpoints.go:166    Finished reconciliation
2024-02-07T12:18:00.199Z    INFO    reapers/nodes.go:56 Waiting for leader election
2024-02-07T12:18:00.256Z    DEBUG   netreap/main.go:142 starting policy poller
2024-02-07T12:18:00.256Z    INFO    policy_poller   policy/policy.go:41 starting Consul watch for key: netreap.io/policy
2024-02-07T12:18:00.257Z    DEBUG   reapers/endpoints.go:98 Got events from Allocation topic. Handling...   {"event-count": 1}
2024-02-07T12:18:00.258Z    DEBUG   reapers/endpoints.go:201    Allocation has no IP address, ignoring  {"event-type": "AllocationUpdated", "event-index": 5083706, "container-id": "374c32a3-f85e-0ed6-2c83-6748b0a41d2a"}
2024-02-07T12:18:00.283Z    INFO    policy_poller   policy/policy.go:98 loaded new policy
2024-02-07T12:18:02.030Z    DEBUG   reapers/endpoints.go:98 Got events from Allocation topic. Handling...   {"event-count": 1}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x116a45f]

goroutine 89 [running]:
github.com/cosmonic/netreap/reapers.(*EndpointReaper).handleAllocationUpdated(0xc000507cc0, {{0xc00060f650, 0xa}, {0xc000682690, 0x11}, {0xc000908480, 0x24}, {0xc00037c600, 0x2, 0x4}, ...})
    /netreap/reapers/endpoints.go:225 +0x6ff
created by github.com/cosmonic/netreap/reapers.(*EndpointReaper).Run.func1
    /netreap/reapers/endpoints.go:103 +0x8c5
protochron commented 9 months ago

@iamredbull thanks for reporting this! Fortunately it was a simple fix, so I can cut a patch release which should fix the problem for you.