elastic / otel-profiling-agent

The production-scale datacenter profiler (C/C++, Go, Rust, Python, Java, NodeJS, .NET, PHP, Ruby, Perl, ...)
Apache License 2.0
2.09k stars 229 forks source link

load program: argument list too long #29

Open sungrasslin opened 1 month ago

sungrasslin commented 1 month ago

I tried to run it, but the program reported an error. I tested on openEuler(4.19.90)

./otel-profiling-agent -collection-agent=11.0.1.200:11000 -disable-tls -verbose   -bpf-log-level=0
time="2024-05-07T16:21:41.712614957+08:00" level=debug msg="Config:"
time="2024-05-07T16:21:41.713944783+08:00" level=debug msg="bpf-log-level: 0"
time="2024-05-07T16:21:41.714489878+08:00" level=debug msg="bpf-log-size: 65536"
time="2024-05-07T16:21:41.714844795+08:00" level=debug msg="cache-directory: /var/cache/otel/profiling-agent"
time="2024-05-07T16:21:41.715587205+08:00" level=debug msg="collection-agent: 11.0.1.200:11000"
time="2024-05-07T16:21:41.715962906+08:00" level=debug msg="config: /etc/otel/profiling-agent/agent.conf"
time="2024-05-07T16:21:41.716214605+08:00" level=debug msg="copyright: false"
time="2024-05-07T16:21:41.716436016+08:00" level=debug msg="disable-tls: true"
time="2024-05-07T16:21:41.716785813+08:00" level=debug msg="map-scale-factor: 0"
time="2024-05-07T16:21:41.717052825+08:00" level=debug msg="no-kernel-version-check: false"
time="2024-05-07T16:21:41.717499231+08:00" level=debug msg="probabilistic-interval: 1m0s"
time="2024-05-07T16:21:41.717709826+08:00" level=debug msg="probabilistic-threshold: 100"
time="2024-05-07T16:21:41.717866628+08:00" level=debug msg="project-id: 1"
time="2024-05-07T16:21:41.718047638+08:00" level=debug msg="secret-token: abc123"
time="2024-05-07T16:21:41.718288634+08:00" level=debug msg="t: all"
time="2024-05-07T16:21:41.718490844+08:00" level=debug msg="tags: "
time="2024-05-07T16:21:41.718877745+08:00" level=debug msg="tracers: all"
time="2024-05-07T16:21:41.719518458+08:00" level=debug msg="v: true"
time="2024-05-07T16:21:41.719868559+08:00" level=debug msg="verbose: true"
time="2024-05-07T16:21:41.720101154+08:00" level=debug msg="version: false"
time="2024-05-07T16:21:41.720364662+08:00" level=info msg="Starting OTEL profiling agent 1.0.0 (revision OTEL-review, build timestamp N/A)"
time="2024-05-07T16:21:41.786557253+08:00" level=debug msg="Validated tags: "
time="2024-05-07T16:21:41.825674390+08:00" level=debug msg="Traffic to 11.0.1.200 is routed from 10.0.2.15"
time="2024-05-07T16:21:41.828227928+08:00" level=error msg="Unable to get host metadata for config: unable to open /proc/sys/kernel/bpf_stats_enabled: open /proc/sys/kernel/bpf_stats_enabled: no such file or directory"
time="2024-05-07T16:21:41.828849137+08:00" level=debug msg="Reading the configuration"
time="2024-05-07T16:21:41.829340743+08:00" level=debug msg="Done setting configuration"
time="2024-05-07T16:21:41.829902847+08:00" level=debug msg="Determining tracers to include"
time="2024-05-07T16:21:41.830438854+08:00" level=debug msg="Tracer string: all"
time="2024-05-07T16:21:41.830999454+08:00" level=info msg="Interpreter tracers: perl,php,python,hotspot,ruby"
time="2024-05-07T16:21:41.831561173+08:00" level=info msg="Automatically determining environment and machine ID ..."
time="2024-05-07T16:21:44.838773933+08:00" level=debug msg="Environment tester (azure) failed: failed to get azure metadata: Get \"http://169.254.169.254/metadata/instance/compute?api-version=2020-09-01&format=json\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
time="2024-05-07T16:21:54.431239546+08:00" level=debug msg="Environment tester (aws) failed: failed to get aws metadata: EC2MetadataRequestError: failed to get EC2 instance identity document\ncaused by: RequestError: send request failed\ncaused by: Get \"http://169.254.169.254/latest/dynamic/instance-identity/document\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
time="2024-05-07T16:21:55.439542972+08:00" level=debug msg="Environment tester (gcp) failed: failed to get GCP metadata: Get \"http://169.254.169.254/computeMetadata/v1/instance/id\": dial tcp 169.254.169.254:80: i/o timeout"
time="2024-05-07T16:21:55.500070495+08:00" level=debug msg="Using MAC: 0x563412005452"
time="2024-05-07T16:21:55.500789001+08:00" level=info msg="Environment: hardware, machine ID: 0x9eead8d68aa7f8e0"
time="2024-05-07T16:21:55.501188814+08:00" level=info msg="Assigned ProjectID: 1 HostID: 2227831381648996576"
time="2024-05-07T16:21:55.554270124+08:00" level=debug msg="Traffic to 11.0.1.200 is routed from 10.0.2.15"
time="2024-05-07T16:21:55.557283668+08:00" level=error msg="Unable to get host metadata: unable to open /proc/sys/kernel/bpf_stats_enabled: open /proc/sys/kernel/bpf_stats_enabled: no such file or directory"
time="2024-05-07T16:21:58.085100042+08:00" level=debug msg="Size of eBPF map exe_id_to_9_stack_deltas: 65536"
time="2024-05-07T16:21:58.108411365+08:00" level=debug msg="Size of eBPF map exe_id_to_15_stack_deltas: 65536"
time="2024-05-07T16:21:58.134409812+08:00" level=debug msg="Size of eBPF map exe_id_to_12_stack_deltas: 65536"
time="2024-05-07T16:21:58.152563258+08:00" level=debug msg="Size of eBPF map stack_delta_page_to_info: 65536"
time="2024-05-07T16:21:58.183999587+08:00" level=debug msg="Size of eBPF map exe_id_to_8_stack_deltas: 65536"
time="2024-05-07T16:21:58.203253447+08:00" level=debug msg="Size of eBPF map exe_id_to_18_stack_deltas: 65536"
time="2024-05-07T16:21:58.229573099+08:00" level=debug msg="Size of eBPF map exe_id_to_20_stack_deltas: 65536"
time="2024-05-07T16:21:58.268821838+08:00" level=debug msg="Size of eBPF map exe_id_to_19_stack_deltas: 65536"
time="2024-05-07T16:21:58.302900795+08:00" level=debug msg="Size of eBPF map exe_id_to_11_stack_deltas: 65536"
time="2024-05-07T16:21:58.325655006+08:00" level=debug msg="Size of eBPF map exe_id_to_13_stack_deltas: 65536"
time="2024-05-07T16:21:58.347174897+08:00" level=debug msg="Size of eBPF map exe_id_to_14_stack_deltas: 65536"
time="2024-05-07T16:21:58.373934555+08:00" level=debug msg="Size of eBPF map exe_id_to_21_stack_deltas: 65536"
time="2024-05-07T16:21:58.458982697+08:00" level=debug msg="Size of eBPF map exe_id_to_10_stack_deltas: 65536"
time="2024-05-07T16:21:58.480097687+08:00" level=debug msg="Size of eBPF map exe_id_to_16_stack_deltas: 65536"
time="2024-05-07T16:21:58.504682723+08:00" level=debug msg="Size of eBPF map exe_id_to_17_stack_deltas: 65536"
time="2024-05-07T16:21:58.530875781+08:00" level=debug msg="Size of eBPF map pid_page_to_mapping_info: 1048576"
time="2024-05-07T16:21:58.613319392+08:00" level=error msg="load program: argument list too long"
time="2024-05-07T16:21:58.613850103+08:00" level=error msg="Failed to load eBPF tracer: failed to load eBPF code: failed to load eBPF programs: failed to load unwind_native"

if i change to "-bpf-log-level=2 -bpf-log-size=8388608“, got this error:

time="2024-05-07T16:18:51.318253683+08:00" level=error msg="load program: invalid argument"
time="2024-05-07T16:18:51.319925706+08:00" level=error msg="Failed to load eBPF tracer: failed to load eBPF code: failed to load eBPF programs: failed to load unwind_stop"
rockdaboot commented 1 month ago

When you built otel-profiling-agent, what was number of instructions reported for the eBPF programs? Output from my machine

Instruction counts for tracer.ebpf.x86:

.text has 0 instructions
perf_event/unwind_dotnet has 3597 instructions
perf_event/unwind_hotspot has 3073 instructions
tracepoint/sched/sched_switch has 1159 instructions
tracepoint/syscalls/sys_enter_read has 22 instructions
perf_event/unwind_stop has 445 instructions
perf_event/native_tracer_entry has 426 instructions
perf_event/unwind_native has 3995 instructions
perf_event/unwind_perl has 2646 instructions
perf_event/unwind_php has 2563 instructions
perf_event/unwind_python has 3739 instructions
perf_event/unwind_ruby has 3255 instructions
tracepoint/sched/sched_process_exit has 111 instructions
tracepoint/syscalls/sys_enter_bpf has 25 instructions
raw_tracepoint/sys_enter has 29 instructions
perf_event/unwind_v8 has 3317 instructions

Total instructions: 28402

The max number of instructions per program is 4096 for kernel 4.19, and the native unwinder is very close to this. If you used a different compiler version, maybe the limit of instructions was reached?