benfred / py-spy

Sampling profiler for Python programs
MIT License
12.76k stars 428 forks source link

python 3.7.3 / rhel 7: panic on dump --locals #224

Closed mattp- closed 4 years ago

mattp- commented 4 years ago

thread 'main' panicked at 'byte index 18446744073709551614 is out of bounds of `.*`', src/libcore/str/mod.rs:2051:9
stack backtrace:
   0:           0x6f2324 - backtrace::backtrace::libunwind::trace::h0d3b97a6b64193be
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1:           0x6f2324 - backtrace::backtrace::trace_unsynchronized::hb78a971d8547111e
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2:           0x6f2324 - std::sys_common::backtrace::_print_fmt::ha1f41287e6ef2374
                               at src/libstd/sys_common/backtrace.rs:77
   3:           0x6f2324 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h1130808f9197ccb5
                               at src/libstd/sys_common/backtrace.rs:61
   4:           0x72542c - core::fmt::write::h4fa8b44d73117031
                               at src/libcore/fmt/mod.rs:1028
   5:           0x6ef747 - std::io::Write::write_fmt::hc08a67176e074177
                               at src/libstd/io/mod.rs:1412
   6:           0x6f489e - std::sys_common::backtrace::_print::h33a300037b4dc1ed
                               at src/libstd/sys_common/backtrace.rs:65
   7:           0x6f489e - std::sys_common::backtrace::print::h2ceebd24f74a6808
                               at src/libstd/sys_common/backtrace.rs:50
   8:           0x6f489e - std::panicking::default_hook::{{closure}}::h599b8688602f475a
                               at src/libstd/panicking.rs:188
   9:           0x6f4591 - std::panicking::default_hook::h0085905ac97017e0
                               at src/libstd/panicking.rs:205
  10:           0x6f4f9b - std::panicking::rust_panic_with_hook::h780f8e9bbe4fc091
                               at src/libstd/panicking.rs:464
  11:           0x6f4b3e - std::panicking::continue_panic_fmt::h1444a364e5f24a1a
                               at src/libstd/panicking.rs:373
  12:           0x6f4a26 - rust_begin_unwind
                               at src/libstd/panicking.rs:302
  13:           0x72121e - core::panicking::panic_fmt::h760360c16b67cc83
                               at src/libcore/panicking.rs:139
  14:           0x722b14 - core::str::slice_error_fail::ha0e10d93ebb88084
                               at src/libcore/str/mod.rs:0
  15:           0x433c56 - core::str::traits::<impl core::slice::SliceIndex<str> for core::ops::range::RangeTo<usize>>::index::{{closure}}::h077970c6194892b8
  16:           0x447caa - py_spy::python_data_access::format_variable::hd9644476a05f6378
  17:           0x446ac3 - py_spy::python_data_access::format_variable::hd9644476a05f6378
  18:           0x446c44 - py_spy::python_data_access::format_variable::hd9644476a05f6378
  19:           0x446ac3 - py_spy::python_data_access::format_variable::hd9644476a05f6378
  20:           0x446c44 - py_spy::python_data_access::format_variable::hd9644476a05f6378
  21:           0x4ad3c8 - py_spy::python_spy::PythonSpy::_get_stack_traces::hd98a2c333b63a32b
  22:           0x4a28ec - py_spy::python_spy::PythonSpy::get_stack_traces::h386da63dba8344f4
  23:           0x49765e - py_spy::dump::print_traces::h8adf110e08ce4bc8
  24:           0x466769 - py_spy::run_spy_command::he9e192defef1e373
  25:           0x469c2e - py_spy::main::he2a48b4fd3934d5e
  26:           0x487343 - std::rt::lang_start::{{closure}}::h47dbd41d95b3d679
  27:           0x6f49c3 - std::rt::lang_start_internal::{{closure}}::hdb12a3ead60a8960
                               at src/libstd/rt.rs:48
  28:           0x6f49c3 - std::panicking::try::do_call::h75828266d21559c3
                               at src/libstd/panicking.rs:287
  29:           0x6fba4a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:78
  30:           0x6f531b - std::panicking::try::h2533a62d10a8f001
                               at src/libstd/panicking.rs:265
  31:           0x6f531b - std::panic::catch_unwind::hd9ecf47e7cd9602d
                               at src/libstd/panic.rs:396
  32:           0x6f531b - std::rt::lang_start_internal::h93ee4050c8419278
                               at src/libstd/rt.rs:47
  33:           0x46b7f2 - main

not sure how to provide more information here, dump without --locals works. Im pointing it at a tornado app.

mattp- commented 4 years ago

by the way, despite this issue your tool just helped me isolate a huge performance bug on production in a matter of minutes. amazingly useful! thank you for it 👍

benfred commented 4 years ago

Thanks for the bug report!

I just spent some time looking at this and I think I've found a minimal reproduction here. Trying to go dump --locals with a function looking like this will fail with the same error you reported:

def local_variable_lookup():
    local6 = ("-" * 115, {"key" : {"key": {"key":"value"}}})
    time.sleep(100000)

It requires a somewhat specific set of circumstances: nested objects at least two deep with the innermost objects being strings and some previous variable that has a string representation of the correct length. This manages to blow through the code that's trying to limit the length of a line - and cause us to try to slice off a negative amount from a string causing the panic you saw.

I have a fix here https://github.com/benfred/py-spy/pull/225 .

I'm glad that py-spy is coming in useful - even with this bug. The fix will be in the next version.

mattp- commented 4 years ago

@benfred not entirely familiar with rust, just wondering if it would be possible on a stacktrace panic to ensure regardless that the target process receives a sigcont? I accidentally left a process in a sigstop state when this happened.

benfred commented 4 years ago

The process should be resumed even after a panic already: I'm not sending a SIGSTOP to pause it, but instead I'm doing a ptrace attach. The corresponding ptrace detach takes place inside a drop trait, which should get called during the panic.

What OS are you running? I just tried this on OSX and linux, and the process wasn't left stopped after causing a panic in py-spy using that code above.

benfred commented 4 years ago

Fix for the panic is in v0.3.3

mattp- commented 4 years ago

im not actually sure, i tried reproducing but wasn't able to. going to close this since it was fixed 👍 thanks