facebookexperimental / hermit

Hermit launches linux x86_64 programs in a special, hermetically isolated sandbox to control their execution. Hermit translates normal, nondeterministic behavior, into deterministic, repeatable behavior. This can be used for various applications, including replay-debugging, reproducible artifacts, chaos mode concurrency testing and bug analysis.
Other
1.19k stars 31 forks source link

panic - 'vdso symbol __vdso_clock_getres's real size is 10 bytes, but trying to replace it with 16 bytes' #16

Closed androm3da closed 1 year ago

androm3da commented 1 year ago

Describe the bug

I hit this panic using hermit built from 95b3ac7be2c464fc7be595c8548f4dd11167600b. The development node I'm allocated is a VM and has very limited PMU modeled. That may not be related to the vdso panic though.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.6 LTS
Release:    18.04
Codename:   bionic
$ cat /proc/version
Linux version 5.4.0-120-generic (buildd@lcy02-amd64-037) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #136~18.04.1-Ubuntu SMP Fri Jun 10 18:00:44 UTC 2022
2022-11-23T01:04:51.638065Z  WARN reverie_ptrace::perf: Pmu bugs detected: HardwareCountersNotWorking { actual_events: 0, expected_min_events: 500, config: 5308625 }
thread 'main' panicked at 'vdso symbol __vdso_clock_getres's real size is 10 bytes, but trying to replace it with 16 bytes', /local/mnt/workspace/install/rust/git/checkouts/reverie-9a587e40a0d7d3be/6f03658/reverie-ptrace/src/vdso.rs:162:17

Indicate any of these common scenarios that apply:

To Reproduce Minimal input to reproduce the behavior.

Expected behavior A clear and concise description of what you expected to happen.

Environment

Additional context Attach the logs to this issue as a text file generated by hermit --log=trace --log-file=FOO run.

Add any other context about the problem here.

hermit_trace.log

arjo129 commented 1 year ago

I have the same problem on Ubuntu 22.04

Linux version 5.15.0-53-generic (buildd@lcy02-amd64-047) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022

It does feel like the error is coming from reverie though.

EspenG commented 1 year ago

I see a very similar issue. attached is attached trace log. trace.txt

Environment:

$ uname -a Linux eg 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu 22.04.1 LTS

proc cpuinfo

samth commented 1 year ago

This is really a bug in reverie (it reproduces running the reverie strace implementation on ls). It affects both clock_gettime and gettimeofday for me, but not clock_getres. I tested this by commenting out the entries for those two vdso entries in VDSO_SYMBOLS in reverie-prtrace/src/vdso.rs, after which I was successfully able to strace /bin/ls.

samth commented 1 year ago

The following change fixes reverie for me: https://github.com/samth/reverie/commit/a7f6cae58c8dbdedb1708ff2d12dc76cffaa4c69 but I cannot be sure if it's right.

jasonwhite commented 1 year ago

I have an upstream fix for this in Reverie that should be getting synced into the repo within the next hour or so. @samth was on the right on the money with the fix. The NOP padding at the end of the VDSO patches was unnecessary and can just be removed. I also have a fix for the getcpu vdso patch. I'll update this thread when it the fix is in.

Thanks @androm3da for reporting this issue! Keep those bug reports coming! :)

samth commented 1 year ago

I don't think this actually fixes the issue. The problem is now 5 bytes vs 8 bytes, but it still errors. You need the changes to handle things being up to 8-byte aligned that are in my patch, or something different (I still don't know if that change is right).

jasonwhite commented 1 year ago

@samth Not quite sure I follow. Does https://github.com/facebookexperimental/reverie/commit/debce82715d2e44636e1b6dbf350412212429b14 not fix the issue? The vdso patches are now 8 bytes instead of 16 (not 5 bytes).

samth commented 1 year ago

Right, that commit does not fix the problem. On my system the original vdso entry is 5 bytes.

jasonwhite commented 1 year ago

Ohh, now I understand. I thought you meant the patch was 5 bytes. That's a pretty small vdso entry size. What distro+version are you running? And what is the kernel version? I'd like to see what those entries are actually doing. Maybe we don't really need to patch them.

samth commented 1 year ago

It's the same machine as this issue: https://github.com/facebookexperimental/hermit/issues/18

Ubuntu 22.10 and 5.19.0 is the short answer.

EspenG commented 1 year ago

fix does not work for me neither.

thread 'main' panicked at 'vdso symbol __vdso_clock_gettime's real size is 5 bytes, but trying to replace it with 8 bytes', /home/eg/.cargo/git/checkouts/reverie-9a587e40a0d7d3be/c448d10/reverie-ptrace/src/vdso.rs:148:17

jasonwhite commented 1 year ago

I was able to reproduce on an Ubuntu 22.04 VM. This is the disassembly of gettimeofday and clock_gettime:

0000000000000bd0 <__vdso_gettimeofday@@LINUX_2.6>:
 bd0:   e9 4b fe ff ff          jmp    a20 <LINUX_2.6@@LINUX_2.6+0xa20>
 bd5:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
 bdc:   00 00 00 00

0000000000000c10 <__vdso_clock_gettime@@LINUX_2.6>:
 c10:   e9 9b fb ff ff          jmp    7b0 <LINUX_2.6@@LINUX_2.6+0x7b0>
 c15:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
 c1c:   00 00 00 00 

In these implementations, it's just a jmp to another internal function. Seems like the inner function didn't get inlined. Luckily, since both of these functions are aligned to 16 bytes via padding, they should be safe to patch. A real fix is landing soon.

androm3da commented 1 year ago

I am using commit 159c343e0b2522b57531cc3db83ae02833336c3e and I'm on ubuntu 20.04

$ cat /proc/version
Linux version 5.4.0-122-generic (buildd@lcy02-amd64-095) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022

I did a cargo clean && cargo build && ./target/debug/hermit run --chaos ls -- this seems to fail the same way it did before. Would those steps get this fix from reverie mentioned above (https://github.com/facebookexperimental/reverie/commit/fa44c91a66b948989462913d2f7d67a7db328694) or do I need to purge some cache somewhere?

jasonwhite commented 1 year ago

@androm3da Try deleting Cargo.lock and doing the build again. (I don't think cargo clean will delete it.) Then, Cargo should pull down the latest commit.

Also, for reference, the commit with the fix is https://github.com/facebookexperimental/reverie/commit/5478e47a25a2aae3f2211bead790a1249630d04f.

androm3da commented 1 year ago

Try deleting Cargo.lock and doing the build again

This did the trick, tyvm @jasonwhite