JakWai01 / lurk

A pretty (simple) alternative to strace
Apache License 2.0
871 stars 28 forks source link

lurk ls -l hangs #24

Closed sigmaSd closed 1 year ago

sigmaSd commented 1 year ago

lurk ls works

but lurk ls -l hangs , the last syscalls before it hangs looks like this

[16980] futex(0x7FFFF7D800E0, 129, 2147483647, 0x0, 140737353468592, 64) = 0
[16980] getdents64(4, 0x5555555AE730, 32768) = 0
[16980] close(4) = 0
[16980] epoll_ctl(6, 3, 5, 0x7FFFFFFFD16C) = 0
[16980] openat(4294967196, "/proc/sys/kernel/random/boot_id", 524544) = 4
[16980] read(4, "serialnumber-azezae..", 38) = 37
[16980] read(4, "", 1) = 0
[16980] close(4) = 0
[16980] timerfd_settime(7, 1, 0x7FFFFFFFD170) = 0

using gdb I can see its stuck in wait4

#0  0x00007f19da6f8707 in wait4 () from /usr/lib/libc.so.6
#1  0x000055cb93b4401c in nix::sys::wait::waitpid<core::option::Option<nix::unistd::Pid>> (pid=..., options=...) at src/sys/wait.rs:319
#2  0x000055cb93b44162 in nix::sys::wait::wait () at src/sys/wait.rs:336
#3  0x000055cb939529df in lurk_cli::Tracer::run_tracer (self=0x7ffc8347bf18) at src/lib.rs:98
#4  0x000055cb9392c36c in lurk::main () at src/main.rs:35
sigmaSd commented 1 year ago

also random thaught what about adding integration test like this (is this useful ?)

#[cfg(test)]
mod tests {

    #[test]
    fn smoke() -> anyhow::Result<()> {
        cat()?;
        ls()?;
        // lurk(&["ls", "-l"])?; // doesn't work yet

        Ok(())
    }

    fn lurk(args: &[&str]) -> anyhow::Result<String> {
        let output = std::process::Command::new("cargo")
            .arg("r")
            .arg("--")
            .args(args)
            .output()?;
        Ok(String::from_utf8(output.stdout)?)
    }

    fn cat() -> Result<(), anyhow::Error> {
        assert!(lurk(&["cat", "/etc/hosts"])?
            .lines()
            .find(|line| line.contains("openat") && line.contains("/etc/hosts"))
            .is_some());
        Ok(())
    }
    fn ls() -> Result<(), anyhow::Error> {
        assert!(lurk(&["ls"])?
            .lines()
            .find(|line| line.contains("openat") && line.contains("\".\""))
            .is_some());
        Ok(())
    }
}
JakWai01 commented 1 year ago

Thanks for opening the issue. I think lurk could definitely use some sort of test suite and I think those integration tests are a great start. I'll check if I can find the problem as soon as I find the time.

tramasys commented 1 year ago

This happens because ls -l uses the epoll_pwait2 syscall. Currently lurk only supports syscalls from 0 to 334. Newer syscalls defined in the range of 424 - 451 are not handled by lurk.

if registers.orig_rax >= 336 {
    continue;
}

These lines in the run_tracer function preempt the loop. The tracee is stopped and the parent is unable to issue another PTRACE_SYSCALL request (here done via ptrace::syscall at the end of the loop). Hence the tracee hangs indefinitely. Removing this lines wouldn't matter tho, as the epoll_pwait2 syscall is also not implemented in the x86_64.rs file.

The gap of syscalls numbers from 334 to 424 is because of the efforts made to sync up syscall numbers between architectures (more info here and here).

sigmaSd commented 1 year ago

Thanks for the explanation

Maybe its better if it crashes, at least it easy to tell what's missing in lurk and even motivate people to add support for missing syscalls

tramasys commented 1 year ago

Probably, but I'm in the process of re-writing the main tracing loop for lurk anyways, not much more work to add the missing 27 syscalls on top of that. Exepect a PR in a few days, I'll make sure to reference this issue as well.