DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.58k stars 551 forks source link

NetBSD port #2888

Open krytarowski opened 6 years ago

krytarowski commented 6 years ago

Hello,

I'm researching options to port dynamorio to NetBSD.

  1. I'm looking for some estimates how much work might be needed to work?

  2. What kernel/userland properties are a must. Like ptrace(2) operations for injector.

  3. How does dynamorio compare to pintool and valgrind in terms of features and performance. Porting pintool looks impossible for now as it's closed-source, and valgrind seems to be hostile to !Linux.

  4. Anything else that I might be aware before "recon by fire" porting.

derekbruening commented 6 years ago

Thank you for your interest in contributing. Xref the MacOS port #58. Given that the basics now work on Mac I don't think a BSD port would be a huge undertaking but it will take some work especially to get full private loader support for nice C++ clients.

The OS dependencies include:

How does dynamorio compare to pintool and valgrind in terms of features and performance. Porting pintool looks impossible for now as it's closed-source, and valgrind seems to be hostile to !Linux.

You can see slides comparing DR and Pin from our last tutorial (slides 199+): http://dynamorio.org/tutorial-cgo17.html

The biggest difference between DR and Pin, besides DR being open-source and supporting ARM and AArch64, is that DR supports arbitrary code stream modifications, while Pin supports only callouts. This means that a Pintool has less control over its final performance, while a DR tool can tune its instrumentation for better performance.

derekbruening commented 6 years ago

DR vs Valgrind: on SPECCPU2006 DR is 1.2x vs Nulgrind 4.6x IIRC, so base DR is 3x-4x faster. On multithreaded apps the performance difference is even larger since Valgrind serializes all the app threads. Valgrind also does not have a general tool-building API like DR and Pin do.

krytarowski commented 6 years ago

Thread-local storage: on x86 a segment register is stolen. This is the major pain point on Mac 64-bit as the kernel there does not let user mode set up a segment.

Is it fine to call libc/loader functions?

Memory query: /proc/self/maps on Linux but procfs probably can't be relied upon on *BSD. Is procstat_getvmmap an alternative or is that only on FreeBSD?

We have the same function in libutil, called in the same way, but the resulting format differs a little bit. We can call it also directly through sysctl(3).

Signals: delivery has to be emulated so precise details have to be matched

I will research it, Linux and BSD signals differ in semantics, but these things are rather standard.

Thread creation and other operations

POSIX thread? In the context of our process? We can have both native threads (LWP) and POSIX ones. The libpthread library shall be preferred in order to not reinvent the wheel.

It's more difficult to start a thread from other program, we would need to use ptrace(2) to rewrite .text.

Executable file parsing: if there are any ELF differences vs Linux

Nothing special that I'm aware of, except different ELF notes, lack of DT_GNU_HASH, different RPATH/RUNPATH semantics.

Syscall monitoring for those related to control flow or the address space

I will research how is it done for Linux.

Injection: ptrace is actually not used by default but various details of the kernel loader are relied upon

I expect that this one will be challenging.

Kernel wait queues: futex on Linux, Mach semaphores on Mac

No futexes or similar user-space primitives on NetBSD (this a part of the reason why we prefer libpthread).

Private loader: still not there on Mac. Requires emulating the app's dynamic loader.

I will see.

derekbruening commented 6 years ago

Thread-local storage: on x86 a segment register is stolen. This is the major pain point on Mac 64-bit as the kernel there does not let user mode set up a segment.

Is it fine to call libc/loader functions?

No. There is no libc or loader in the address space at all during DR initialization, and later it is not safe to use any library code that the app might use. Raw system calls should be used. This is why it is best if the kernel exposes full segment functionality to user space.

Thread creation and other operations

POSIX thread? In the context of our process? We can have both native threads (LWP) and POSIX ones. The libpthread library shall be preferred in order to not reinvent the wheel.

Whatever forms of kernel threads the app can create need to be monitored: injected into on creation, resources cleaned up on destruction, managed if there is any kernel-mediated suspension. User-space-only pseudo-threads (e.g., most fiber implementations) do not need special treatment.