gitpod-io / gitpod

The developer platform for on-demand cloud development environments to create software faster and more securely.
https://www.gitpod.io
GNU Affero General Public License v3.0
12.78k stars 1.23k forks source link

Support rr debugger (record-replay) by allowing the syscall `perf_event_open` in Gitpod workspaces #9687

Open jankeromnes opened 2 years ago

jankeromnes commented 2 years ago

Is your feature request related to a problem? Please describe

Debugging software with rr in Gitpod currently doesn't work:

# Install rr
$ cd /tmp && wget https://github.com/rr-debugger/rr/releases/download/5.5.0/rr-5.5.0-Linux-$(uname -m).deb && sudo dpkg -i rr-5.5.0-Linux-$(uname -m).deb

# Try rr with any binary
$ cd - && rr record ./binary
rr needs /proc/sys/kernel/perf_event_paranoid <= 1, but it is 2.
Change it to 1, or use 'rr record -n' (slow).
Consider putting 'kernel.perf_event_paranoid = 1' in /etc/sysctl.d/10-rr.conf.
See 'man 8 sysctl', 'man 5 sysctl.d' (systemd systems)
and 'man 5 sysctl.conf' (non-systemd systems) for more details.

Initially reported by William Durand from Mozilla: https://twitter.com/couac/status/1521092130890031105

Describe the behaviour you'd like

I suspect this fails because Gitpod's seccomp profile disables the syscall perf_event_open by default.

I also believe that we could allow perf_event_open in Gitpod, provided there aren't any major security issues.

This would allow Gitpod users to benefit from the powerful and popular record-replay debugger rr.

Describe alternatives you've considered

Additional context

To work properly, rr needs:

... as well as a seccomp profile that allows:

Sources:

jankeromnes commented 2 years ago

William also correctly pointed out that we might want to make sure rr actually supports AMD CPUs first:

I am thinking that we should probably make sure that rr actually supports the AMD CPU first. Ideally we would verify that we can record a trace on a host machine and then within Docker (with seccomp=unconfined).

willdurand commented 2 years ago
rr needs /proc/sys/kernel/perf_event_paranoid <= 1, but it is 2.
Change it to 1, or use 'rr record -n' (slow).
Consider putting 'kernel.perf_event_paranoid = 1' in /etc/sysctl.d/10-rr.conf.

FWIW, this first warning cannot be solved currently. Creating the /etc/sysctl.d/10-rr.conf file and reloading sysctl will skip the config file:

gitpod /tmp $ echo 'kernel.perf_event_paranoid = 1' | sudo tee /etc/sysctl.d/10-rr.conf
kernel.perf_event_paranoid = 1

gitpod /tmp $ cat /etc/sysctl.d/10-rr.conf
kernel.perf_event_paranoid = 1

gitpod /tmp $ sudo  sysctl --system
[...]

* Applying /etc/sysctl.d/10-rr.conf ...
sysctl: setting key "kernel.perf_event_paranoid", ignoring: Read-only file system

[...]

We can use record -n apparently, though. That being said, with 5.5.0 (installed as described in the issue above), there is another error:

gitpod /tmp $ rr --version
rr version 5.5.0

gitpod /tmp $ rr record -n /usr/bin/ls
[FATAL /home/roc/rr/rr/src/PerfCounters_x86.h:104:compute_cpu_microarch()] AMD CPU type 0xf10 unknown

What do we do now? We build rr ourselves and we try again:

gitpod /tmp/obj $ /usr/local/bin/rr record -n /usr/bin/ls
[FATAL /tmp/rr/src/PerfCounters.cc:224:start_counter() errno: EPERM] Failed to initialize counter
=== Start rr backtrace:
/usr/local/bin/rr(_ZN2rr13dump_rr_stackEv+0x5d)[0x55bf3dfa28b6]
/usr/local/bin/rr(_ZN2rr15notifying_abortEv+0x16)[0x55bf3dfa2815]
/usr/local/bin/rr(_ZN2rr12FatalOstreamD1Ev+0x34)[0x55bf3ddda3d6]
/usr/local/bin/rr(+0x408234)[0x55bf3de0b234]
/usr/local/bin/rr(+0x4083f8)[0x55bf3de0b3f8]
/usr/local/bin/rr(+0x40a01c)[0x55bf3de0d01c]
/usr/local/bin/rr(+0x40a58f)[0x55bf3de0d58f]
/usr/local/bin/rr(_ZN2rr12PerfCounters23default_ticks_semanticsEv+0x21)[0x55bf3de0d74f]
/usr/local/bin/rr(_ZN2rr7SessionC2Ev+0x107)[0x55bf3df2bbff]
/usr/local/bin/rr(_ZN2rr13RecordSessionC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS6_SaIS6_EESD_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEiNS_7BindCPUES8_PKNS_9TraceUuidEbb+0x65)[0x55bf3de27211]
/usr/local/bin/rr(_ZN2rr13RecordSession6createERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EESB_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEhNS_7BindCPUERKS7_PKNS_9TraceUuidEbbb+0xc3d)[0x55bf3de26cdf]
/usr/local/bin/rr(+0x416872)[0x55bf3de19872]
/usr/local/bin/rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x40f)[0x55bf3de1a7d9]
/usr/local/bin/rr(main+0x27d)[0x55bf3dfbeb2f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f1a066ee0b3]
/usr/local/bin/rr(_start+0x2e)[0x55bf3dcec6de]
=== End rr backtrace
Aborted (core dumped)
khuey commented 2 years ago
rr needs /proc/sys/kernel/perf_event_paranoid <= 1, but it is 2.
Change it to 1, or use 'rr record -n' (slow).
Consider putting 'kernel.perf_event_paranoid = 1' in /etc/sysctl.d/10-rr.conf.

FWIW, this first warning cannot be solved currently. Creating the /etc/sysctl.d/10-rr.conf file and reloading sysctl will skip the config file:

kernel.perf_event_paranoid is a single global config setting for the whole kernel. It can't be set inside a container.

What do we do now? We build rr ourselves and we try again:

gitpod /tmp/obj $ /usr/local/bin/rr record -n /usr/bin/ls
[FATAL /tmp/rr/src/PerfCounters.cc:224:start_counter() errno: EPERM] Failed to initialize counter
=== Start rr backtrace:
/usr/local/bin/rr(_ZN2rr13dump_rr_stackEv+0x5d)[0x55bf3dfa28b6]
/usr/local/bin/rr(_ZN2rr15notifying_abortEv+0x16)[0x55bf3dfa2815]
/usr/local/bin/rr(_ZN2rr12FatalOstreamD1Ev+0x34)[0x55bf3ddda3d6]
/usr/local/bin/rr(+0x408234)[0x55bf3de0b234]
/usr/local/bin/rr(+0x4083f8)[0x55bf3de0b3f8]
/usr/local/bin/rr(+0x40a01c)[0x55bf3de0d01c]
/usr/local/bin/rr(+0x40a58f)[0x55bf3de0d58f]
/usr/local/bin/rr(_ZN2rr12PerfCounters23default_ticks_semanticsEv+0x21)[0x55bf3de0d74f]
/usr/local/bin/rr(_ZN2rr7SessionC2Ev+0x107)[0x55bf3df2bbff]
/usr/local/bin/rr(_ZN2rr13RecordSessionC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS6_SaIS6_EESD_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEiNS_7BindCPUES8_PKNS_9TraceUuidEbb+0x65)[0x55bf3de27211]
/usr/local/bin/rr(_ZN2rr13RecordSession6createERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EESB_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEhNS_7BindCPUERKS7_PKNS_9TraceUuidEbbb+0xc3d)[0x55bf3de26cdf]
/usr/local/bin/rr(+0x416872)[0x55bf3de19872]
/usr/local/bin/rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x40f)[0x55bf3de1a7d9]
/usr/local/bin/rr(main+0x27d)[0x55bf3dfbeb2f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f1a066ee0b3]
/usr/local/bin/rr(_start+0x2e)[0x55bf3dcec6de]
=== End rr backtrace
Aborted (core dumped)

This is perf_event_open(2) being disallowed by the seccomp policy.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

willdurand commented 2 years ago

William also correctly pointed out that we might want to make sure rr actually supports AMD CPUs first:

I am thinking that we should probably make sure that rr actually supports the AMD CPU first. Ideally we would verify that we can record a trace on a host machine and then within Docker (with seccomp=unconfined).

This issue is still valid but without access to the Gitpod "hardware" (see quote above), there isn't a lot external contributors can do at the moment.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

GitMensch commented 1 year ago

The CPU type was added to rr https://github.com/rr-debugger/rr/issues/2872; so the biggest part is the docker configuration (and so far I've only seen an option to adjust the dockerd arguments, but not the arguments for docker run.

Please add --cap-add=SYS_PTRACE --security-opt seccomp=unconfined as documented in https://github.com/rr-debugger/rr/wiki/Docker.

khuey commented 1 year ago

CAP_SYS_PTRACE probably isn't necessary these days.

GitMensch commented 1 year ago

CAP_SYS_PTRACE probably isn't necessary these days.

Can you please retest and adjust the rr wiki? Any insight if it is possible to adjust the security policy with an option to dockerd?

khuey commented 1 year ago

CAP_SYS_PTRACE probably isn't necessary these days.

Can you please retest and adjust the rr wiki?

That's not really a priority for me.

Any insight if it is possible to adjust the security policy with an option to dockerd?

You can create your own seccomp profile e.g.

                {
                  "defaultAction": "SCMP_ACT_ALLOW",
                  "architectures": [
                    "SCMP_ARCH_X86_64",
                    "SCMP_ARCH_X86"
                  ]
                }

and then do dockerd --seccomp-path=/path/to/that/file.json

Somebody could actually spend the time to come up with a minimal seccomp profile for rr itself but that's a non-trivial amount of work.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

GitMensch commented 1 year ago

The question is (@jankeromnes ?): is there any reason to not allow perf_event_open by default, for example by adding --security-opt seccomp=unconfined to docker?

jankeromnes commented 1 year ago

@GitMensch I believe it should be safe to always allow perf_event_open by default, for example by slightly adjusting Gitpod's seccomp profile to allow this syscall in workspaces.

However, we wouldn't want to use --security-opt seccomp=unconfined for Gitpod workspaces, because this would enable all possible syscalls, some of which might harm the isolation between Gitpod workspaces.

So, instead of entirely disabling seccomp in Gitpod, we should consider all syscalls separately (for example, when they can unlock super cool use cases like rr debugging in Gitpod -- just like when we enabled gdb debugging in Gitpod) and assess their added_value / potential_risk compromise.

GitMensch commented 1 year ago

I'm totally fine with that.

I believe it should be safe to always allow perf_event_open by default, for example by slightly adjusting Gitpod's seccomp profile to allow this syscall in workspaces.

So... I guess this is on the schedule now?

jankeromnes commented 1 year ago

So... I guess this is on the schedule now?

It is not yet on the schedule. For it to be, we need to lobby Gitpod's workspace team into picking up this issue (hi @kylos101! ๐Ÿ‘‹ ๐Ÿ˜‡)

sg- commented 1 year ago

Any progress on getting perf_event_open added by default? I'd love to be able to run perf on my programs inside a gitpod development container.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

GitMensch commented 1 year ago

Sadly there is still no option to use rr or at least perf stat in gitpod containers, is there?

jankeromnes commented 1 year ago

@GitMensch Sadly there isn't yet. However, let's keep this issue open until there is. ๐Ÿ‘

GitMensch commented 12 months ago

@jankeromnes You've previously said

I believe it should be safe to always allow perf_event_open by default, for example by slightly adjusting Gitpod's seccomp profile to allow this syscall in workspaces.

and I agree, so... Who is the one that this issue is now depending on? Is @kylos101 "from Gitpod's workspace team" the right (and possibly only) one?

If I understood this correctly this would add perf stat and friends and would at least be a start for testing rr.

GitMensch commented 10 months ago

From SO:

The problem is that Docker by default blocks a list of system calls, including perf_event_open, which perf relies heavily on. Official docker reference: https://docs.docker.com/engine/security/seccomp/

Solution:

  • Download the standard seccomp (secure compute) file for docker. It's a json file.
  • Find "perf_event_open", it only appears once, and delete it.
  • Add a new entry in syscalls section: { "names": [ "perf_event_open" ], "action": "SCMP_ACT_ALLOW" },
  • Add the following to your command to run the container: --security-opt seccomp=path/to/default.json

This possibly is not enough for rr, but should be the necessary start to at least run perf.

GitMensch commented 3 weeks ago

@jankeromnes Can you please try the steps outlined above for adding minimal perf counter support to GitPod? This missing feature is the main reason for me to not develop on GitPod.