elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.17k stars 4.92k forks source link

[Linux] SIGSEGV: segmentation violation during cgo execution of cgoLookupIP and getaddrinfo #41398

Open cmacknz opened 1 day ago

cmacknz commented 1 day ago

We have an internal example of multiple Beats failing shortly after startup with a segmentation fault in CGO code. The exact path leading to this is not clear yet because the problem is in CGO, although we do have the stack trace which is attached.

{"log.level":"info","@timestamp":"2024-10-18T15:10:23.373Z","message":"running under elastic-agent, per-beat lockfiles disabled","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"service.name":"filebeat","ecs.version":"1.6.0","log.origin":{"file.line":443,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).launch"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.374Z","message":"Starting stats endpoint","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.logger":"api","log.origin":{"file.line":69,"file.name":"api/server.go","function":"github.com/elastic/beats/v7/libbeat/api.(*Server).Start"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.374Z","message":"Syscall filter successfully installed","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.logger":"seccomp","log.origin":{"file.line":125,"file.name":"seccomp/seccomp.go","function":"github.com/elastic/beats/v7/libbeat/common/seccomp.loadFilter"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.374Z","message":"Beat info","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"service.name":"filebeat","system_info":{"beat":{"path":{"config":"/opt/Elastic/Agent/data/elastic-agent-8.15.2-621bbc/components","data":"/opt/Elastic/Agent/data/elastic-agent-8.15.2-621bbc/run/filestream-monitoring","home":"/opt/Elastic/Agent/data/elastic-agent-8.15.2-621bbc/components","logs":"/opt/Elastic/Agent/data/elastic-agent-8.15.2-621bbc/components/logs"},"type":"filebeat","uuid":"5a0b058b-04d4-4e07-b5cd-3a4aef38a2f7"},"ecs.version":"1.6.0"},"log.logger":"beat","log.origin":{"file.line":1385,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.logSystemInfo"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.374Z","message":"Build info","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.logger":"beat","log.origin":{"file.line":1394,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.logSystemInfo"},"service.name":"filebeat","system_info":{"build":{"commit":"26daf71e4ec87172523af7f0e916cba9f79dc0d0","libbeat":"8.15.2","time":"2024-09-19T09:24:35.000Z","version":"8.15.2"},"ecs.version":"1.6.0"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.374Z","message":"Go runtime info","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.logger":"beat","log.origin":{"file.line":1397,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.logSystemInfo"},"service.name":"filebeat","system_info":{"ecs.version":"1.6.0","go":{"arch":"amd64","max_procs":8,"os":"linux","version":"go1.22.6"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.375Z","message":"Host info","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"system_info":{"ecs.version":"1.6.0","host":{"architecture":"x86_64","boot_time":"2024-10-18T11:12:02+02:00","containerized":false,"id":"3fe2439e8486446eabcfaac351556a64","ip":["127.0.0.1","::1","10.0.0.45","fd00::9250:6d5f:2a99:b767","fe80::2078:f5bd:8159:2e29","10.0.0.47","fd00::9402:7f04:e6ae:472c","fe80::14c1:3059:f370:301a"],"kernel_version":"6.11.3-arch1-1","mac":["f8:75:a4:52:86:80","f8:75:a4:52:86:7f","24:41:8c:35:dd:51"],"name":"antiope","native_architecture":"x86_64\n","os":{"build":"rolling","family":"arch","major":0,"minor":0,"name":"Arch Linux","patch":0,"platform":"arch","type":"linux","version":""},"timezone":"CEST","timezone_offset_sec":7200}},"log.logger":"beat","log.origin":{"file.line":1403,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.logSystemInfo"},"service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.375Z","message":"Process info","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.logger":"beat","log.origin":{"file.line":1432,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.logSystemInfo"},"service.name":"filebeat","system_info":{"ecs.version":"1.6.0","process":{"capabilities":{"ambient":null,"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","perfmon","bpf","checkpoint_restore"],"effective":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","perfmon","bpf","checkpoint_restore"],"inheritable":null,"permitted":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","perfmon","bpf","checkpoint_restore"]},"cwd":"/opt/Elastic/Agent/data/elastic-agent-8.15.2-621bbc/run/filestream-monitoring","exe":"/opt/Elastic/Agent/data/elastic-agent-8.15.2-621bbc/components/agentbeat","name":"agentbeat","pid":611948,"ppid":600393,"seccomp":{"mode":"filter","no_new_privs":true},"start_time":"2024-10-18T17:10:22.500+0200"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.376Z","message":"Setup Beat: filebeat; Version: 8.15.2","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.origin":{"file.line":341,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).createBeater"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.376Z","message":"Metrics endpoint listening on: /opt/Elastic/Agent/data/tmp/xTEtpJ7117ppc6OYvJCaYHbDW8mLjXGe.sock (configured: unix:///opt/Elastic/Agent/data/tmp/xTEtpJ7117ppc6OYvJCaYHbDW8mLjXGe.sock)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"api","log.origin":{"file.line":71,"file.name":"api/server.go","function":"github.com/elastic/beats/v7/libbeat/api.(*Server).Start.func1"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.376Z","message":"Output is configured through Central Management","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"service.name":"filebeat","ecs.version":"1.6.0","log.origin":{"file.line":373,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).createBeater"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-10-18T15:10:23.378Z","message":"Beat name: antiope","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.logger":"publisher","log.origin":{"file.line":105,"file.name":"pipeline/module.go","function":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.LoadWithSettings"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-18T15:10:23.381Z","message":"SIGSEGV: segmentation violation","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-18T15:10:23.381Z","message":"PC=0x0 m=4 sigcode=1 addr=0x0","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-10-18T15:10:23.381Z","message":"signal arrived during cgo execution","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"ecs.version":"1.6.0"}

cgo_segfault.json

elasticmachine commented 1 day ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

cmacknz commented 1 day ago

Possibly relates to:

mauri870 commented 1 day ago

Briefly looking at the logs I can see references such as net.cgoLookupHostIP, this is the C netdns resolver. We could opt-in to use the netgo resolver.

Edit: The crash seems to be triggered in the call to reflect.implements https://github.com/elastic/go-ucfg/blob/4fd3937/initializer.go#L39C29-L39C39

rdner commented 1 day ago

Does the issue happen if GODEBUG=netdns=go set?

mauri870 commented 1 day ago

Does the issue happen if GODEBUG=netdns=go set?

Also wondering about this. The cgo resolver uses threads so in high contention scenarios the netgo resolver might perform better by leveraging goroutines.

cmacknz commented 1 day ago

Does the issue happen if GODEBUG=netdns=go set?

Confirmed that setting GODEBUG=netdns=go stops this from happening.

rdner commented 22 hours ago

There is a chance that this PR will fix it https://github.com/elastic/beats/pull/41402 The PR updates glibc from 2.28 to 2.31.