benfred / py-spy

Sampling profiler for Python programs
MIT License
12.53k stars 414 forks source link

Error: failed to get os threadid #490

Closed baohongmin closed 2 years ago

baohongmin commented 2 years ago

py-spy top --native --pid 229875 Error: failed to get os threadid py-spy 0.3.11

Jongy commented 2 years ago

This means py-spy failed to get the native thread ID. This can happen due to numerous reasons depending on the OS you are using. On which system are you running py-spy?

In any case, the direct trigger for this error is --native - if you remove this flag, this error shouldn't trigger; so you can try without it if you can go without native traces.

baohongmin commented 2 years ago

Hi, Jongy Thanks for your response. My OS information is as bellow Linux icx08 4.18.0-305.12.1.el8_4.x86_64 #1 SMP Wed Aug 11 01:59:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux. gcc version 8.5.0 20210514 (Red Hat 8.5.0-10) (GCC) Python-3.6.5 Linux distribution: CentOS Linux | 8 | libc version: glibc-2.28

I profile a running process running inside the docker container. If I remove the flag --native, it can go well, but I want to trace the native stack(C/C++ extension).

Jongy commented 2 years ago

Ah, py-spy doesn't support getting the OS thread ID for dockerized processes. See _get_os_thread_id impl for linux:

    #[cfg(all(target_os="linux", unwind))]
    fn _get_os_thread_id<I: InterpreterState>(&mut self, python_thread_id: u64, interp: &I) -> Result<Option<Tid>, Error> {
....
        // likewise this doesn't yet work for profiling processes running inside docker containers from the host os
        if self.dockerized {
            return Ok(None);
        }

I think that's the issue.

This is actually something we've been tackling but I don't have a solution ready yet.

Meanwhile - I can suggest that you run py-spy inside the container - that is, in the same PID NS.

For example, if the host PID is 229875 and the PID inside the container is 40, and the container is named my_app, then you can instead copy py-spy into the container (use the static musl build): docker cp ./py-spy my_app:/py-spy then run it (note - privileged is required): docker exec -it --privileged /py-spy top --native --pid 40. I think that'll work (at least, it will avoid the OS thread ID issue).

baohongmin commented 2 years ago

Ah, py-spy doesn't support getting the OS thread ID for dockerized processes. See _get_os_thread_id impl for linux:

    #[cfg(all(target_os="linux", unwind))]
    fn _get_os_thread_id<I: InterpreterState>(&mut self, python_thread_id: u64, interp: &I) -> Result<Option<Tid>, Error> {
....
        // likewise this doesn't yet work for profiling processes running inside docker containers from the host os
        if self.dockerized {
            return Ok(None);
        }

I think that's the issue.

This is actually something we've been tackling but I don't have a solution ready yet.

Meanwhile - I can suggest that you run py-spy inside the container - that is, in the same PID NS.

For example, if the host PID is 229875 and the PID inside the container is 40, and the container is named my_app, then you can instead copy py-spy into the container (use the static musl build): docker cp ./py-spy my_app:/py-spy then run it (note - privileged is required): docker exec -it --privileged /py-spy top --native --pid 40. I think that'll work (at least, it will avoid the OS thread ID issue).

Thanks Jongy, Yes, It can run well, when I run py-spy inside the container.

Jongy commented 2 years ago

Glad it helped :)

benfred commented 2 years ago

Fwiw, with python 3.11 we can get the OS thread id directly from python, and will be able to grab it from a dockerized process from the host container. We still won't be able to do native profiling from the host into the container though -

rkooo567 commented 1 year ago

I also found the same error. https://github.com/ray-project/ray/issues/30566

But for our case, we run py-spy within a docker container, so I am not sure how we can debug this issue... any pointer to take a look?

rkooo567 commented 1 year ago

I found when I don't specify this is returned

Thread 0x7FB1278F5740 (active): "MainThread"
    main_loop (ray/_private/worker.py:763)
    <module> (ray/_private/workers/default_worker.py:233)
Thread 860 (idle): "ray_import_thread"
    wait (threading.py:300)
    _wait_once (grpc/_common.py:106)
    wait (grpc/_common.py:148)
    result (grpc/_channel.py:735)
    _poll_locked (ray/_private/gcs_pubsub.py:255)
    poll (ray/_private/gcs_pubsub.py:391)
    _run (ray/_private/import_thread.py:69)
    run (threading.py:870)
    _bootstrap_inner (threading.py:926)
    _bootstrap (threading.py:890)
Thread 864 (idle): "AsyncIO Thread: default"
    run (threading.py:870)
    _bootstrap_inner (threading.py:926)
    _bootstrap (threading.py:890)
Thread 866 (idle): "Thread-2"
    run (threading.py:870)
    _bootstrap_inner (threading.py:926)
    _bootstrap (threading.py:890)
Thread 0x7F9F815EB700 (active)
Thread 39212 (idle): "Thread-19"
    channel_spin (grpc/_channel.py:1258)
    run (threading.py:870)
    _bootstrap_inner (threading.py:926)
    _bootstrap (threading.py:890)

Is this related to that we have a thread 0x7F9F815EB700 that doesn't seem to be a Python thread?

benfred commented 1 year ago

@rkooo567 that looks pretty odd to me - I'm unsure why py-spy managed to figure out the native threadid in some cases, but not others. Is there a way I can run this myself to investigate ? (docker container with python script to run etc).