py-spy thread blocks when locking the python thread

When try to sample the python process with the following command py-spy record --pid {pid} --native -r 97 -d 10 -f raw -o xxx.txt

Sometimes py-spy process blocks for a long time with no results, I tried to analyze it, and I found that the thread was blocked in waitpid while locking the python thread

Thread 2 (LWP 3638147): #0 sccp () at ../src_musl/src/thread/__syscall_cp.c:11 #1 0x00007f39c9af010a in waitpid () at ../src_musl/src/process/waitpid.c:6 #2 0x00007f39c9945781 in remoteprocess::linux::Thread::lock::h744566c9e0fc13a0 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181 #3 0x00007f39c9944450 in remoteprocess::linux::Process::lock::h2fa7fa8cf9cf2ef8 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181 #4 0x00007f39c9830e71 in py_spy::python_spy::PythonSpy::_get_stack_traces::h6a885e6fcc0586bc () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181 #5 0x00007f39c9828ece in py_spy::python_spy::PythonSpy::get_stack_traces::h70c11f2d38dbd8c4 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181 #6 0x00007f39c98ce0d6 in std::sys_common::backtrace::__rust_begin_short_backtrace::h415d421592b1dec3 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181 #7 0x00007f39c98c0007 in core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h1748460e0434bbc0 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181 #8 0x00007f39c9aca533 in _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h0032718f1ee1e442 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872 #9 _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h5bc95b757ddfa29b () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872 #10 std::sys::unix::thread::Thread::new::thread_start::h069db4c5e748eedf () at library/std/src/sys/unix/thread.rs:108 #11 0x00007f39c9af31a3 in start () at ../src_musl/src/thread/pthread_create.c:192 #12 0x00007f39c9af4214 in __clone () at ../src_musl/src/thread/x86_64/clone.s:22

When run with the environment variable RUST_LOG=debug, i got

[2023-04-11T06:57:29.556278705Z DEBUG remoteprocess::linux] attached to thread 432 [2023-04-11T06:57:29.556310223Z DEBUG remoteprocess::linux] attached to thread 460 [2023-04-11T06:57:29.556362082Z DEBUG remoteprocess::linux] attached to thread 479 [2023-04-11T06:57:29.556413127Z DEBUG remoteprocess::linux] attached to thread 481 [2023-04-11T06:57:29.556438563Z DEBUG remoteprocess::linux] attached to thread 509 [2023-04-11T06:57:29.556989723Z DEBUG remoteprocess::linux] detached from thread 432 [2023-04-11T06:57:29.557012099Z DEBUG remoteprocess::linux] detached from thread 460 [2023-04-11T06:57:29.557035824Z DEBUG remoteprocess::linux] detached from thread 479 [2023-04-11T06:57:29.557054071Z DEBUG remoteprocess::linux] detached from thread 481 [2023-04-11T06:57:29.557073439Z DEBUG remoteprocess::linux] detached from thread 509 [2023-04-11T06:57:29.562972706Z DEBUG remoteprocess::linux] attached to thread 432 [2023-04-11T06:57:29.563043296Z DEBUG remoteprocess::linux] attached to thread 460 [2023-04-11T06:57:29.563105838Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCONT to 479 [2023-04-11T06:57:58.277320101Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCHLD to 479 [2023-04-11T07:01:29.290933962Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCHLD to 479 [2023-04-11T07:03:28.843141736Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCHLD to 479 [2023-04-11T07:03:58.990260667Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCHLD to 479 [2023-04-11T07:06:59.876781243Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCHLD to 479 [2023-04-11T07:15:00.296871553Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCHLD to 479

Looks like it's blocking while locking the python thread, but this doesn't always happen, sometimes I can successfully get the results.

I would like to know if there is a known reason or if there is any suggestion for me, thank you very much！

The timestamps in your log show that py-spy was attached to 479 for more than 15 minutes which is way too much. This is probably a bug in py-spy.

As an immediate workaround I can suggest you use py-spy with --noblocking flag, which might make it a bit less accurate, but will prevent any side effects on the profiled application.

hi, @Jongy Thanks for your suggestion, but using the --nonblocking flag is not enough for what I want to get. If this is a bug, do you have any suggestions for us try to fix it on this case？

If this is a bug, do you have any suggestions for us try to fix it on this case？

I spent a few minutes looking in the relevant code now.

impl ThreadLock {
    fn new(tid: nix::unistd::Pid) -> Result<ThreadLock, nix::Error> {
        ptrace::attach(tid)?;
        while let wait::WaitStatus::Stopped(_, sig) = wait::waitpid(tid, Some(wait::WaitPidFlag::WSTOPPED | wait::WaitPidFlag::__WALL))? {
            if sig == nix::sys::signal::Signal::SIGSTOP {
                break;
            }

            debug!("reinjecting non-SIGSTOP signal {} to {}", sig, tid);
            ptrace::cont(tid, sig)?;
        }

        debug!("attached to thread {}", tid);
        Ok(ThreadLock{tid})
    }
}

py-spy, per your stack trace, is blocked in this waitpid() call. I'm acquainted with this code because I actually wrote it 2 years back :sweat_smile: see here.

We are trying (in remoteprocess & py-spy) to PTRACE_ATTACH the process, then reinject non-SIGSTOP signals. The way I understand the semantics of PTRACE_ATTACH, a SIGSTOP must follow it. However what we're seeing here is - the thread 479 got numerous non-SIGSTOP signals over 17 minutes, but no SIGSTOP! I don't know how it can happen - that's the bug. What happens meanwhile is, that some of the threads are already PTRACE_ATTACHed (in this case, 432 and 460) so the process is in a bad state. Some of its threads are blocked unexpectedly.

I don't have any concrete idea how to solve it - I'll keep thinking and post here if I find anything. A workaround I can suggest is to add basic timeout to the logic. If we fail to lock all threads in, say, 500us * Nthreads, py-spy could bail on it, detach all threads it has already attached, and consider this sample an error. I don't suppose this problem reproduces consistently (as you said @1C4nfaN , "sometimes", and this seems like a tough race condition), so perhaps this solution will be enough.

To debug this further, I'd run alongside perf record -e signal:* and see the interaction - is a SIGSTOP generated? What else happens around that time?

@1C4nfaN if you manage to reproduce it consistently, please post here instructions to do so, it will greatly help me (or anyone else debugging) since this is very esoteric.

A workaround I can suggest is to add basic timeout to the logic. If we fail to lock all threads in, say, 500us * Nthreads, py-spy could bail on it, detach all threads it has already attached, and consider this sample an error.

Hi @Jongy , Thanks for your help！I think it's good advice to consider timeouts in this case, but in another test, i see that

23-04-11T04:02:05.106454671Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:05.267191159Z DEBUG remoteprocesSS: linux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:05.427691376Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:05.588338448Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:05.748893669Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:05.909496956Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:06.070129688Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:06.230674664Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:06.391133380Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:06.551662220Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:06.712171494Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:06.872638318Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:07.033214994Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCONto 509 23-04-11T04:02:07.193885198Z DEBUG remoteprocess::linux| reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:07.3544982832 DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:07.515260753Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:07.675824288Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:07.836527054Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:07.997126241Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:08.157658180Z DEBUG remoteprocess: linux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:08.318285370Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:08.4789423522 DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:08.639499228Z DEBUG remoteprocess::1inux] reinjecting non-SIGSTOP signal SIGCONT to 509 23-04-11T04:02:08.800293246Z DEBUG remoteprocess::linux] reinjecting non-SIGSTOP signal SIGCONto 509

It's not blocking but keeps triggering waitpid and ptrace::cont in the while loop, really confused me...

I checked the python process sampled by py-spy, the pid is 432, 460/479/481/509 are its thread. What's interesting is that there is a process that I didn't pay attention to before that is monitoring 432's cpu usage, and may periodically call signal.kill(432, SIGSTOP)/signal.kill(432, SIGCONT) to 432, I think there might be a race condition here (although I really haven't figured out why)

Would like to ask your opinion on using PTRACE_SEIZE instead of PTRACE_ATTACH? I saw some blogs describing this usage but not really sure if this is valid...

If there are new discoveries we can discussed here. Thanks again ❤️

By the way, I have another simple error scenario here.

Below are the contents of /proc/{pid}/stat for two processes.

1981866 (python3) S 1981743 1981866 1981743 0 -1 1077936384 34709484 16140 105 0 72770 37613 11 9 20 0 17 0 372814758 4442140672 55046 18446744073709551615 94888328609792 94888328612516 140731140400576 0 0 0 0 16781312 17642 0 0 0 17 79 0 0 16 0 0 94888330710392 94888330711088 94888354955264 140731140409715 140731140409755 140731140409755 140731140415457 0

1984043 (probe(1981866)) S 1981866 1981866 1981743 0 -1 4194368 1324 0 0 0 169 99 0 0 20 0 3 0 372915682 4424044544 33799 18446744073709551615 94888328609792 94888328612516 140731140400576 0 0 0 0 16781312 17642 0 0 0 17 63 0 0 0 0 0 94888330710392 94888330711088 94888354955264 140731140409715 140731140409755 140731140409755 140731140415457 0

When we try to use py-spy dump --pid 1984043, then we get the following error Error: Failed to parse /proc/1984043/stat.

I think the problem should lie in the judgment of this code for the case of '))' (in remoteprocess)

fn get_active_status(stat: &[u8]) -> Option<u8> {
    // find the first ')' character, and return the active status
    // field which comes after it
    let mut iter = stat.iter().skip_while(|x| **x != b')');
    match (iter.next(), iter.next(), iter.next()) {
        (Some(b')'), Some(b' '), ret) => ret.map(|x| *x),
        _ => None
    }
}

I thought I'd make a note of the issue here, though I'm not very familiar with the code that actually executes in process 1984043, I will try to check this and if necessary I'd willing to fix it if I had time.

It's not blocking but keeps triggering waitpid and ptrace::cont in the while loop, really confused me..

That's the same scenario. What you put here is ust another manifestation of the bug.

I'll elaborate more to what I wrote in my previous comment - py-spy attempts to lock all e.g 6 threads of your Python process. It successfully locks 3 and on the 4th, due to a bug, it remains forever in the "reinejcting non-SIGSTOP" loop for thread 4. Meanwhile - threads 5 6 are not locked yet, so they are running "fine", but threads 1 2 3 are already locked - PTRACE_ATTACHed.

What's interesting is that there is a process that I didn't pay attention to before that is monitoring 432's cpu usage, and may periodically call signal.kill(432, SIGSTOP)/signal.kill(432, SIGCONT) to 432, I think there might be a race condition here (although I really haven't figured out why)

This definitely might be related. I have some idea in mind... will need to try it out - perhaps, if the process got a SIGSTOP (and when a process gets a signal, any thread in it might get it randomly, but SIGSTOP behaves differently because it's handled in the kernel) and then py-spy tries to PTRACE_ATTACH while the thread is already in STOPPED state, the ptrace-induced SIGSTOP is never generated and py-spy waits in vain.

I might be able to try it out later today, will write here if so.

Would like to ask your opinion on using PTRACE_SEIZE instead of PTRACE_ATTACH? I saw some blogs describing this usage but not really sure if this is valid...

PTRACE_SEIZE does not stop the process - this is the equivalent of passing --nonblocking.

When we try to use py-spy dump --pid 1984043, then we get the following error Error: Failed to parse /proc/1984043/stat.

This is a separate problem indeed, it seems that the function doesn't cope with process comms that have ) in them. Not a very common case for Python processes, but I suppose no reason not to support it properly. The logic needs to be "parse from the first ( until the matching )" or maybe just "parse the last )" instead of "the first )". Can you create a separate issue for that?

I might be able to try it out later today, will write here if so.

Thank you so much for your time～

PTRACE_SEIZE does not stop the process - this is the equivalent of passing --nonblocking.

That's right, I forgot to describe earlier I referenced the following statement from ptrace

"Since Linux 3.4, PTRACE_SEIZE can be used instead of PTRACE_ATTACH. PTRACE_SEIZE does not stop the attached process. If you need to stop it after attach (or at any other time) without sending it any signals, use PTRACE_INTERRUPT command."

so maybe PTRACE_SEIZE+PTRACE_INTERRUPT could instead of PTRACE_ATTACH?

This is a separate problem indeed, it seems that the function doesn't cope with process comms that have ) in them. Not a very common case for Python processes, but I suppose no reason not to support it properly. The logic needs to be "parse from the first ( until the matching )" or maybe just "parse the last )" instead of "the first )". Can you create a separate issue for that?

Sure, i'll create a separate issue.

When py-spy run with py-spy record --pid {pid} --native -r 97 -d 10 -f raw -o xxx.txt, I try to printing the stack, as below

Thread 3 (LWP 805052):
#0  sccp () at ../src_musl/src/thread/__syscall_cp.c:11
#1  0x00007f72dcd2359f in read () at ../src_musl/src/unistd/read.c:6
#2  0x00007f72dcc23552 in nix::unistd::read::h0aa062cf7d2053b6 ()
#3  0x00007f72dcafe652 in std::sys_common::backtrace::__rust_begin_short_backtrace::hcc9258a46d3c1d8f ()
#4  0x00007f72dcaef386 in core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h991851875476ab0b ()
#5  0x00007f72dccf9533 in _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h0032718f1ee1e442 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872
#6  _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h5bc95b757ddfa29b () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872
#7  std::sys::unix::thread::Thread::new::thread_start::h069db4c5e748eedf () at library/std/src/sys/unix/thread.rs:108
#8  0x00007f72dcd221a3 in start () at ../src_musl/src/thread/pthread_create.c:192
#9  0x00007f72dcd23214 in __clone () at ../src_musl/src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC

Thread 2 (LWP 805051):
#0  sccp () at ../src_musl/src/thread/__syscall_cp.c:11
#1  0x00007f72dcd2359f in read () at ../src_musl/src/unistd/read.c:6
#2  0x00007f72dcb7b4b7 in maps_next () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#3  0x00007f72dcb7b81b in _Ux86_64_get_elf_image () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#4  0x00007f72dcb7a9a8 in get_unwind_info () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#5  0x00007f72dcb7ab00 in _UPT_find_proc_info () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#6  0x00007f72dcb80359 in fetch_proc_info () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#7  0x00007f72dcb81740 in find_reg_state () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#8  0x00007f72dcb819bc in _Ux86_64_dwarf_step () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#9  0x00007f72dcb7cf5f in _Ux86_64_step () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#10 0x00007f72dcb627ed in _$LT$remoteprocess..linux..libunwind..Cursor$u20$as$u20$core..iter..traits..iterator..Iterator$GT$::next::h8d6ba6e958cd1c39 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#11 0x00007f72dca69779 in py_spy::python_spy::PythonSpy::_get_pthread_id::h58fdef68ab857262 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#12 0x00007f72dca6648b in py_spy::python_spy::PythonSpy::_get_os_thread_id::h807756e474dc43d7 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#13 0x00007f72dca603b4 in py_spy::python_spy::PythonSpy::_get_stack_traces::h6a885e6fcc0586bc () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#14 0x00007f72dca57ece in py_spy::python_spy::PythonSpy::get_stack_traces::h70c11f2d38dbd8c4 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#15 0x00007f72dcafd0d6 in std::sys_common::backtrace::__rust_begin_short_backtrace::h415d421592b1dec3 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#16 0x00007f72dcaef007 in core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h1748460e0434bbc0 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#17 0x00007f72dccf9533 in _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h0032718f1ee1e442 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872
#18 _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h5bc95b757ddfa29b () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872
#19 std::sys::unix::thread::Thread::new::thread_start::h069db4c5e748eedf () at library/std/src/sys/unix/thread.rs:108
#20 0x00007f72dcd221a3 in start () at ../src_musl/src/thread/pthread_create.c:192
#21 0x00007f72dcd23214 in __clone () at ../src_musl/src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC

Thread 1 (LWP 805050):
#0  0x00007f72dcd1eb7e in __syscall6 () at ../src_musl/arch/x86_64/syscall_arch.h:59
#1  syscall () at ../src_musl/src/misc/syscall.c:20
#2  0x00007f72dccf6775 in std::sys::unix::futex::futex_wait::h65564c44ef71c5aa () at library/std/src/sys/unix/futex.rs:61
#3  0x00007f72dcce98b9 in std::sys_common::thread_parker::futex::Parker::park::h1857a2d51b91109b () at library/std/src/sys_common/thread_parker/futex.rs:52
#4  std::thread::park::he5cdd1e0067c0814 () at library/std/src/thread/mod.rs:929
#5  0x00007f72dccf22a2 in std::sync::mpsc::blocking::WaitToken::wait::h7da894c0826b7e66 () at library/std/src/sync/mpsc/blocking.rs:67
#6  0x00007f72dca6df78 in std::sync::mpsc::stream::Packet$LT$T$GT$::recv::h6d1e110c42068000 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#7  0x00007f72dcb014af in std::sync::mpsc::Receiver$LT$T$GT$::recv::hb203c62da94a10a9 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#8  0x00007f72dcb0d05f in py_spy::run_spy_command::h059726825bd45700 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#9  0x00007f72dcb0ed14 in py_spy::main::h523879457f122e9a () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#10 0x00007f72dcafe5d3 in std::sys_common::backtrace::__rust_begin_short_backtrace::h961c1afc5ca03793 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#11 0x00007f72dca7e209 in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hd9822804426d1ce6 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181
#12 0x00007f72dcce8fa1 in core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h18fd15a330c116b9 () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ops/function.rs:280
#13 std::panicking::try::do_call::he918a328b783a420 () at library/std/src/panicking.rs:492
#14 std::panicking::try::h88ecc043ee71c652 () at library/std/src/panicking.rs:456
#15 std::panic::catch_unwind::h074c8fe23976f009 () at library/std/src/panic.rs:137
#16 std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::he26a080dffc4764a () at library/std/src/rt.rs:128
#17 std::panicking::try::do_call::hf9be9f327f382512 () at library/std/src/panicking.rs:492
#18 std::panicking::try::h163c4b2413cf9165 () at library/std/src/panicking.rs:456
#19 std::panic::catch_unwind::h58a40b305e945d4e () at library/std/src/panic.rs:137
#20 std::rt::lang_start_internal::h183e60933887f95a () at library/std/src/rt.rs:128
#21 0x00007f72dcb0fb02 in main () at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181

In my understanding, Thread1 wait for recv to get stacks from Thread2, Thread2 in maps_next to get native stack, but what I don't quite understand is where thread3 is spawned and what is it doing, could you guide me some details? If I think the delay of record samples is pretty serious, that may be from Thread2? Thanks~

benfred / py-spy

py-spy thread blocks when locking the python thread #565