cknd / stackprinter

Debugging-friendly exceptions for Python
MIT License
1.28k stars 37 forks source link

Fail when using taichi (Exception: Picked an invalid source context) #45

Closed WangWei90 closed 2 years ago

WangWei90 commented 3 years ago

Your package is so useful that I stop using ipython anymore. Thank you!

I start using taichi recently and I encounter an error, the minimal sample is list below.

import stackprinter
import taichi as ti

stackprinter.set_excepthook(style="darkbg2")

@ti.kernel
def my_kernel() -> float:
    return "22"

my_kernel()
cknd commented 3 years ago

thank you for the friendly bug report! could you paste the full outputs of the error?

cknd commented 3 years ago

nevermind, reproduced it here:

[Taichi] mode=release
[Taichi] preparing sandbox at /var/folders/fw/knp6j91n6nz7v68dphcwyfh80000gp/T/taichi-tg0jluo_
[Taichi] version 0.7.15, llvm 10.0.0, commit cff542ce, osx, python 3.8.5
[Taichi] materializing...
Stackprinter failed while formatting <FrameInfo blatest.py, line 7, scope my_kernel>:
  File "/Users/c/Dropbox/projects/tracebacks/stackprinter/frame_formatting.py", line 224, in select_scope
    raise Exception("Picked an invalid source context: %s" % info)
Exception: Picked an invalid source context: [7], [8], dict_keys([8, 9])
cknd commented 3 years ago

I think I found something. For some reason, the line number that the python interpreter reports on that particular frame is before the beginning of the code scope of that frame (as discovered by the inspect module, which ultimately relies on frame.f_code.co_firstlineno i.e. ultimately the interpreter contradicts itself).

I have a workaround #46, which detects this situation, prints a warning and moves the displayed lineno to the beginning of the frame's available source block. Of course I'm only half-happy with that, but it fixes the crash and seems like reasonable behavior.

(Interestingly, I could only reproduce this bug with taichi so far, so I assume it's an edge case where the interpreter confuses itself during the heavy introspection & dynamic rewriting of the stack that taichi seems to be doing.)

If you're able to test the workaround locally (by doing pip uninstall stackprinter; pip install -e /path/to/a/clone/of/this/repository), I'd be grateful to hear how it goes on your real code!

On my side, I can run your test now and get a traceback with all sorts of stuff in it 🙂 (Note the "problematic" frame in the middle)

     taichi_kernel.define = <bound method PyCapsule.define of <taichi_core.KernelProxy o
                             bject at 0x10d184830>>
     taichi_ast_generator = <function 'Kernel.materialize.<locals>.taichi_ast_generator'
                              kernel.py:357>
    ..................................................

File "/Users/c/.py3/lib/python3.8/site-packages/taichi/lang/kernel.py", line 365, in taichi_ast_generator
    357  def taichi_ast_generator():
 (...)
    361          raise TaichiSyntaxError(
    362              "Kernels cannot call other kernels. I.e., nested kernels are not allowed. Please check if you have direct/indirect invocation of kernels within kernels. Note that some methods provided by the Taichi standard library may invoke kernels, and please move their invocations to Python-scope."
    363          )
    364      self.runtime.inside_kernel = True
--> 365      compiled()
    366      self.runtime.inside_kernel = False
    ..................................................
     TaichiSyntaxError = <class 'taichi.lang.exception.TaichiSyntaxError'>
     self.runtime.inside_kernel = True
     compiled = <function 'my_kernel' test_taichi.py:7>
    ..................................................

File "test_taichi.py", line 7, in my_kernel
    6    # // Stackprinter: This frame reported a line number outside its reported code scope. Line 6 reported, but guessing 7 instead.
--> 7    def my_kernel() -> float:
    8        return "22"
    ..................................................
     float = <class 'float'>
    ..................................................

File "/Users/c/.py3/lib/python3.8/site-packages/taichi/lang/expr.py", line 33, in __init__
    10   def __init__(self, *args, tb=None):
 (...)
    29                   if isinstance(arg, np.ndarray):
    30                       arg = arg.dtype(arg)
    31               except:
    32                   pass
--> 33               self.ptr = impl.make_constant_expr(arg).ptr
    34       else:
    ..................................................
     self = <ti.Expr>
     args = ('22', )
     tb = None
     arg = '22'
     np.ndarray = <class 'numpy.ndarray'>
WangWei90 commented 3 years ago

Thank you for the quick fixing, I will evaluate this later today 😄

WangWei90 commented 3 years ago

I can confirm this workaround is working. Thank you!

$ uname -a
Darwin appledeMacBook-Pro.local 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64 x86_64 i386 MacBookPro11,4 Darwin
WangWei90 commented 3 years ago

(Interestingly, I could only reproduce this bug with taichi so far, so I assume it's an edge case where the interpreter confuses itself during the heavy introspection & dynamic rewriting of the stack that taichi seems to be doing.)

I think so. Here is another example stackprinter will fail 🤣

import stackprinter
import taichi as ti

stackprinter.set_excepthook(style="darkbg2")

@ti.func
def wrong_syntax_in_taichi(value: ti.template()):
    for i in value:
        print(i)
    return

@ti.kernel
def my_kernel(a:float, b:float) -> float:
    var = ti.Vector([a, b])
    wrong_syntax_in_taichi(var)
    return a + b

my_kernel(12, 12)

Maybe I am asking too much here, I do not if stackprinter should handle these situations.

WangWei90 commented 3 years ago

I also create an issue in the taichi language.

https://github.com/taichi-dev/taichi/issues/2216

cknd commented 3 years ago

Interesting, in your second example, the troubled frame looks like this:

I could extend my workaround and guess a different lineno here too -- may still be better than crashing?

cknd commented 3 years ago

If you pull the catch_wrong_linenos branch, your second example now also prints a warning instead of crashing : )

(It still feels wrong that my workaround just guesses the nearest line number in the available source scope without fully understanding the cause for the mismatch. At least the message reports the originally reported lineno as well, so it's not completely obfuscating the situation to the reader..?)

WangWei90 commented 3 years ago

Thank you! I will take some time to investigate this issue.