FritsHoogland / postgres-bpftrace

4 stars 0 forks source link

ERROR: Failed to open perf buffer #1

Closed seadba closed 1 year ago

seadba commented 1 year ago

this looks very interesting, thanks - i do get the following error when running as root in the container

postgres@e2ec579ba3b5:~$ sudo ./postgres-bpftrace/wait-histograms-pid.bt Attaching 183 probes... perf_event_open: Permission denied (check your kernel for PERF_COUNT_SW_BPF_OUTPUT support, 4.4 or newer) ERROR: Failed to open perf buffer

5.15.0-86-generic

FritsHoogland commented 1 year ago

Found: https://github.com/iovisor/bcc/issues/363 My container:

postgres@3fafb87551e4:~$ uname -a
Linux 3fafb87551e4 6.1.29-0-virt #1-Alpine SMP PREEMPT_DYNAMIC Wed, 17 May 2023 14:22:15 +0000 x86_64 GNU/Linux

Like is mentioned in the bcc issue, it seems a mix up of kernel headers and bcc?

FritsHoogland commented 1 year ago

Can you try it with the container? My hunch at the moment is the mixup.

seadba commented 1 year ago

thanks for fast response - im in the container at the moment

root@2799ad3ba053:/home/postgres/postgres-bpftrace# uname -a Linux 2799ad3ba053 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023 x86_64 GNU/Linux

seadba commented 1 year ago

maybe i grabbed the wrong source

seadba commented 1 year ago

ok, looks good

@wait_event_hist[135843, client, CLIENT_READ]: [32, 64) 1 |@@@@@@@@@@@@@@@@@ | [64, 128) 0 | | [128, 256) 0 | | [256, 512) 1 |@@@@@@@@@@@@@@@@@ | [512, 1K) 0 | | [1K, 2K) 0 | | [2K, 4K) 0 | | [4K, 8K) 0 | | [8K, 16K) 1 |@@@@@@@@@@@@@@@@@ | [16K, 32K) 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32K, 64K) 3 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [64K, 128K) 0 | | [128K, 256K) 0 | | [256K, 512K) 1 |@@@@@@@@@@@@@@@@@

seadba commented 1 year ago

i added SYS_ADMIN to the run cmd

docker run --rm -it --cap-add SYS_ADMIN --cap-add CAP_SYS_RESOURCE --cap-add CAP_BPF --cap-add CAP_PERFMON --cap-add CAP_SYS_PTRACE

FritsHoogland commented 1 year ago

I don't understand the OS version. In the docker file, the image that is used is:

FROM debian:stable-slim

Which should get you Debian 12 (bookworm), which comes with kernel 6.1.

Or did you not use the Dockerfile, but your own container?

FritsHoogland commented 1 year ago

In the docker directory in the source there is Dockerfile, and you can use a few simple scripts:

seadba commented 1 year ago

i took the source from here

git clone https://github.com/FritsHoogland/postgres-bpftrace.git

FritsHoogland commented 1 year ago

@seadba In the root directory of the source there are the bpftrace commands.

But the question is: did you use Docker build from the docker directory, or did you use your own? If you did use your own, please retry with the build, run and exec commands in it. That should work.

seadba commented 1 year ago

sorry, its a bit early out here - yes i followed your cmds from here https://databaseperformance.hashnode.dev/test-drive-postgresql-with-wait-event-probes-on-docker?source=more_articles_bottom_blogs

seadba commented 1 year ago

good tool, thanks very much

Attaching 19 probes... PostgreSQL statement execution analyzer. Time in microseconds (us). pid :Phase :time to phase :time in phase : query ------|-----------|--------------|--------------|------ [155734] execute : ( 0) : 258550845: [155824] execute : ( 0) : 258551494: [155822] execute : ( 0) : 258551856: [155826] execute : ( 0) : 258552988: [155826] execute : ( 0) : 258553001: [155833] rewrite : ( 258562939) : 20: BEGIN; [155778]Query start: : : INSERT INTO pgbench_history ( tid, bid, aid, delta, mtime) VALUE [155778] parse : ( 10) : 30: INSERT INTO pgbench_history ( tid, bid, aid, delta, mtime) VALUE

FritsHoogland commented 1 year ago

That's weird...the build script uses the docker file in the directory, which should use Debian. Can you verify if the docker file you got has FROM debian:stable-slim at the start of the file??

seadba commented 1 year ago

:/data/depot/bpf/postgres-bpftrace/docker# ls build docker.md README.md
Dockerfile.postgres_17_modified exec run wait_event.patch

:/data/depot/bpf/postgres-bpftrace/docker# head Dockerfile.postgres_17_modified FROM debian:stable-slim

FritsHoogland commented 1 year ago

If your cloned repo is current with GitHub, then the build command should build a Debian:stable-slim based container, as you can see and listed from the docker file, agree? Such a container should look like this:

➜ ./run
postgres@a09695d1bf75:~$ uname -a
Linux a09695d1bf75 6.1.29-0-virt #1-Alpine SMP PREEMPT_DYNAMIC Wed, 17 May 2023 14:22:15 +0000 x86_64 GNU/Linux
postgres@a09695d1bf75:~$ cat /etc/issue
Debian GNU/Linux 12 \n \l

postgres@a09695d1bf75:~$

The ENTRYPOINT uses the postgres user, you should use sudo to use the bpftrace scripts. But of course you can also do sudo su to become root (in general, I believe even in test and development, root should not be used).

seadba commented 1 year ago

its interesting - i just pulled it today, first time - i get this

uname -a

Linux 2799ad3ba053 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023 x86_64 GNU/Linux

root@2799ad3ba053:/home/postgres/postgres-bpftrace# cat /etc/issue Debian GNU/Linux 12 \n \l

not sure -

build script

docker info || exit

docker build --progress plain --rm -t pg17_modified --file Dockerfile.postgres_17_modified .

docker build --rm -t pg17_modified --file Dockerfile.postgres_17_modified .

FritsHoogland commented 1 year ago

Oh. Wait a minute.

Can you run docker info before logging in to the container? I think it picks the kernel version from the host from which it instantiates the container. In my case this is colima so I can emulate x86_64 on aarch64, which uses kernel version 6.1.29-0-virt.

Maybe you use it on ubuntu?

(docker info | grep Kernel)

seadba commented 1 year ago

docker info | grep Kernel

Kernel Version: 5.15.0-86-generic

FritsHoogland commented 1 year ago

(please paste terminal output in between triple backticks to prevent them from looking weird)

I think that's that mystery solved: the kernel is the kernel from the host from where the container is running, not from the container image itself. That is the reason for the "weird" kernel version.

Different kernel versions have different capabilities in them: https://man7.org/linux/man-pages/man7/capabilities.7.html

I actually made sure I picked the lowest necessary capabilities and not use CAP_SYS_ADMIN, because it's an overloaded capability.

seadba commented 1 year ago

ok, thanks - 1 last question - is there any additional steps i would need to take to run this externally - it runs fine in the container - then raises this error when run against a cluster on the server - i did update the postgres bin dir wait-historgrams-pid.bt ERROR:couldn't get argument 0 for /usr/lib/postgresql/14/bin/postgres::waiteventstart t

FritsHoogland commented 1 year ago

ebpftrace itself has some requirements: it uses LLVM to compile scripts to BPF-bytecode, and uses BCC for interacting with the linux BPF system, and linux dynamic tracing (kprobes, uprobes and tracepoints).

For using the postgres wait event tracing, it requires modifications to the postgres source code, for which you should manually insert the changes if it's not the latest source (version 17) from GitHub, and then the source code to be compiled and installed. That is an important downside to it, but there is no other way, because the function in which the wait event info is manipulated is inlined, which means it's inserted into the function that calls it for each invocation, which means it's gone in compiled postgres. That is the reason for the probe and the custom modification.

If you took the source code from probably version 14, and inserted the changes manually, and then compiled and installed the compiled executables and libraries, it should work.

seadba commented 1 year ago

great , thanks again for all the help - very good tool much appreciated