cadets / freebsd-old

FreeBSD src tree http://www.FreeBSD.org/
Other
12 stars 7 forks source link

Create a system call to allow the user space to call dtrace_probe() #67

Open gvnn3 opened 7 years ago

gvnn3 commented 7 years ago

The LOOM code would like a place to point a probe from user space. We will add a new system call that takes a set of arguments and fires a known dtrace probe point, likely in the sdt provider, to get that arbitrary data out of the program, under LOOM's direction.

rwatson commented 7 years ago

I assume we can't use the existing trap mechanism used by USDT and friends because there would not be a way to differentiate those probes firing from Loom probes..? Or do USDT traps always pass a unique probe ID, in which case in fact the same mechanism could be used via different probe IDs..?

trombonehero commented 7 years ago

The issue with existing trap mechanisms is that they rely heavily on low-level details that aren't available at the IR level of abstraction. With USDT we effectively perform a system call, but with a trap-then-introspect-on-arguments-defined-elsewhere mechanism that assumes knowledge of offsets within ELF sections, etc. This isn't really compatible with the current approach, where we have a set of values in hand that we just want to pass to the kernel directly, but it's especially incompatible with the longer-term vision for "IR everywhere", where some applications may never hit an ELF binary format but instead be JITted directly.

rwatson commented 7 years ago

I guess I was taking the view that the trap used by USDT currently was just a trapping mechanism, and USDT and a new provider could share it. But if that's not the case, a new system call can certainly be added. What is the current "calling convention" for the USDT trap?

trombonehero commented 7 years ago

@bkidney can probably provide more details / correct me if I'm wrong, but I believe that the USDT convention is:

  1. (at process startup) the process registers some DOF (in a special ELF section) that describes probe locations,
  2. (at probe enable time) DTrace changes NOPs to int 3 instructions,
  3. (at probe firing time) an int 3 instruction is executed,
  4. the USDT provider inspects the previously-registered DOF section, using the PC of the interrupting instruction as a key to find the probe name and argument locations,
  5. the provider copies in arguments from the DOF-specified location and
  6. the probe fires in the kernel.

We're pondering an approach in which we replace all of the above with a system call that takes explicit argument values and forwards them via (initially) an SDT probe.

rwatson commented 7 years ago

Another question: one reason to use a NOP->INT3 binary rewrite is to minimise userspace overhead for an infrequently used instrumentation point, perhaps at the cost of some greater expense in processing the instrumentation in the kernel due to PC-based lookup. Do you see a different frequency-of-use tradeoff for the new mechanism? Would the proposed system-call arguments simply consist of some identifier (for the userspace probe point) and a (slightly shortened) set of direct arguments to submit to DTrace?

dstolfa commented 7 years ago

From a slightly different perspective in terms of usage: addition of the syscall could possibly allow for an easier way of tracing userspace from the perspective of the host in the context of virtualization. However, one thing that comes to mind is: would the syscall in this case map the probe IDs from the process-local IDs to global IDs in the DTrace framework to prevent situations where the process might be malicious and notify of random probes firing throughout the system?

bkidney commented 7 years ago

@rwatson To start I will use a global flag that will turn on or off the instrumentation. The flag will be hard coded to get some measurements on overhead. The idea for the future will be to have a method for enabling and disabling at runtime. More though needs to go into that mechanism.

@dstolfa We will be looking at creating a per-thread or per-process namespace to uniquely identify the source of the probe. The current thought is to use the module name in the tuple. This should be generated in a way to make guessing the identify hard in order to thwart attempts to use it maliciously.

gvnn3 commented 7 years ago

There is a version of this in the dtrace-gnn-syscall-probe branch.

dstolfa commented 7 years ago

@gvnn3 From the implementation it would seem that as of now, a single kernel SDT probe is being called every time. Is this an initial version that will be followed up with probe-mapping code from the context of a given process, or was this the intended behaviour?