beartype / beartype

Unbearably fast near-real-time hybrid runtime-static type-checking in pure Python.
https://beartype.readthedocs.io
MIT License
2.61k stars 57 forks source link

Implementing `get_callable_filename_or_placeholder` #18

Closed Heliotrop3 closed 3 years ago

Heliotrop3 commented 3 years ago

Task List

Introductions

Taken from beartype.beartype._util.utilcallable

#FIXME: Implement us up
#FIXME: Unit test us up.
def get_callable_filename_or_placeholder(func: Callable) -> str:
    '''
    Absolute filename of the uncompiled Python script or module physically
    declaring the passed callable if any *or* the placeholder string
    ``"<string>"`` implying this callable to have been dynamically declared
    in-memory otherwise.

    Parameters
    ----------
    func : Callable
        Callable to be inspected.

    Returns
    ----------
    str
        Either:

        * If this callable is physically declared by an uncompiled Python
          script or module, the absolute filename of this script or module.
        * Else, the placeholder string ``"<string>"`` implying this callable to
          have been dynamically declared in-memory.

    See Also
    ----------
    :func:`inspect.getsourcefile`
        Inefficient stdlib function strongly inspiring this implementation,
        which has been highly optimized for use by the performance-sensitive
        :func:`beartype.beartype` decorator.
    '''

Note: inspect.getfile is heavily referenced throughout. It does the heavy lifting of inspect.getsourcefile

Down the Rabbit Hole

As noted by in the official documentation, callable can throw false positives. Curiosity and Google led to a StackOverflow post with a comment declaring the best way check is EAFP. As a following comment points out, there was a brief interlude from Python 3.0 to 3.2 where callable was not in Python. It's reintroduction was, according to the devs, a quality of life revert aimed at addressing long-distance calls: calls to objects far from their point of summoning.

Note: There's a more recent post pertaining to the state of callablein Python 3.10 that's worth keeping track of.

Design Notes

Detecting False Positives

Whats a false-positive? According to inspect.getfile:

Note: Functions/Methods aren't the only objects containing co_filename. inspect.getfile includes Frame and Traceback objects which beartype would omit as they are not callable.

Detecting Compiled File

This occurs in inspect.getsourcefile which attempts to confirm the file exists via os.path.exists. If that evaluates to false it checks linecache.cache for the file path. We could follow suit or we could forgo the existential questioning and return a boolean indicating whether the file extension is a byte-code suffix.

def is_compiled(file_path: str) -> bool:
    # Return True if the extension is a byte-code suffix
    # Otherwise return False 

Dynamically Allocated Callables.

...I have not been able to find how that's determined. I was able to find more articles on Python's garbage collection than I ever expected to. This will probably be one of those facepalm moments.

Dealing with built-ins

Built-in methods and functions count as...well, methods and functions... Which means they pass through a callable check only to cause trouble later on. Why? "Built-in" means compiled from their C source into the underlying Python framework - no .py file to get. Catching them is pretty easy we just need to add an additional check:

def get_callable_filename_or_placeholder(func: Callable) -> str:
    ...
    if isinstance(func, types.BuiltinFunctionType):
        # Handle things here

This is different than how inspect handles built-ins; inspect determines whether an object is built-in using deductive reasoning: "The Callable failed to hit a nested return. Therefore, it must be a built-in type".

Closing Note

Further investigation is required into the post pertaining to the state of callable in Python 3.10.

leycec commented 3 years ago

Bravo! A deep dive on the callable() builtin goes shockingly deep on stdlib internals. This is a lot to masticate my toothless gums on. Let's start with...

Curiosity and Google led to a StackOverflow post with a comment declaring the best way check is EAFP.

Triggered.

EAFP is one of two historical trends in Python that I'd love to see erased into an alternate timeline. The other, of course, is duck typing. Thanks to MyPy & Friends, ...this is us duck typing is going away. That just leaves EAFP.

EAFP is bad because it's slow, because exception handling in Python is slow. Interestingly, it's technically only the except ...: branch of EAFP that's slow; the try: branch is significantly faster, because it's only the handling part of exception handling that's slow. Unless an exception is actually raised, the try: branch only incurs minimal overhead.

Most Pythonistas don't care about performance – or publicly claim they don't, anyway. We do, because performance is our only raison d'être. This means we only EAFP when we either:

Of course, sometimes we have no other choice. Testing whether an arbitrary module is importable or whether an arbitrary object is hashable are the canonical examples. Thanks to the existence of the __hash__() dunder method that can be overridden to raise arbitrary exceptions, the only robust means of implementing the latter test is:

def is_object_hashable(obj: object) -> bool:
    try:
        hash(obj)
        return True
    except:
        return False

What can you do, you know?

Note: inspect.getfile is heavily referenced throughout. It does the heavy lifting of inspect.getsourcefile

Excellent, my dark acolyte. Excellent. Let us cut out the perfidious middle man! With respect to get_callable_filename_or_placeholder(), we only care about inspect.getfile(). Good catch, it is.

We'll probably never want inspect.getsourcefile() here, because that would imply we are doing something perfidious with source code, which we eventually intend to do (e.g., vile and meddlesome AST transformations) in a completely separate package that is not beartype. But... I digress.

As noted by in the official documentation, callable can throw false positives.

Is this still true? Fascinating and horrible if so. We suspect this is only true for callable classes overriding the __call__() dunder method, which we are currently comfortable with ignoring, because we don't even support callable classes yet.

Ignoring callable classes, the callable() builtin should absolutely be reliable. Should. Unless there's black and eldritch magic afoot in CPython's C implementation that I'm blissfully unaware of, which there probably is.

Whats a false-positive? According to inspect.getfile, function/method objects lacking __code__.co_filename and classes whose __module__ has a __file__ evaluate to None.

Just to be unambiguous, but those edge cases aren't actually false positives with respect to whether or not those objects are callable; those objects are still callable, so there's no false positive. They're just in-memory rather than on-disk, which is absolutely okay and rather common in the dynamic hellscape of interpreted Python.

Wrapper functions generated by the @beartype decorator are a sane example. They're callable functions (which is good), but currently lack __code__.co_filename attributes (which is bad), because they're defined in-memory. In fact, solving the latter issue is exactly why the get_callable_filename_or_placeholder() getter was scoped out in the first place! We have now gone recursive.

The Forest Rather Than the Trees View™ of this whole thing is that the traceback for exception messages raised by wrapper functions generated by the @beartype decorator is currently non-human-readable. Those functions are listed as residing in the non-existing <string> file (which is bad). Moreover, the line number responsible for raising that traceback's exception is listed as the non-existing ??? line (which is even worse).

Solving the former issue requires getting and copying the __code__.co_filename attribute from the decorated callable onto the wrapper function generated by the @beartype for that callable. Enter get_callable_filename_or_placeholder() stage left, folks.

Solving the latter issue requires considerably more care. If my failing memory serves, we need to:

  1. Synthesize a fake in-memory private beartype module – possibly a unique module for each wrapper function generated by the @beartype decorator, but hopefully not.
  2. Add each wrapper function to that module.
  3. Register that module with the stdlib linecache module, updated on each addition to that module.

Behold! It is brutally ugly, but it's not our fault. This is what I tell myself.

Note: Functions/Methods aren't the only objects containing co_filename. inspect.getfile includes Frame and Traceback objects which beartype would omit as they are not callable.

Right. Super catch, Master Helmsman Heliotrope. You continue to have the helm.

This occurs in inspect.getsourcefile which attempts to confirm the file exists via os.path.exists. If that evaluates to false it checks linecache.cache for the file path. We could follow suit or we could forgo the existential questioning...

Yes. Existential questioning is slow and so is inspect.getsourcefile, so let's not walk that lethargic road of inexorable doom. Again, The Forest Rather Than the Trees View™ of this whole thing is just to generate human-readable tracebacks – a very non-mission-critical feature. Generating perfectly accurate, complete, and comprehensive tracebacks that account for all possible edge cases (however unlikely) would be noice for certain definitions of noice, but never at the expense of... well, anything else that actually matters.

...and return a boolean indicating whether the file extension is a byte-code suffix.

Sure! We could try hard. Or we could be lazy. I suggest we be lazy, because my drooping middle-aged gut is the standard I live by. Let's do absolutely nothing and just accept the value of the co_filename attribute as is. In 99.9999999% of real-world cases give or take a digit of accuracy, that value will be what you expect: the absolute filename of the .py-suffixed source module defining that callable.

In the unlikely event that's not the case... well, that's tragic and deplorable and makes my grubby toes curl with indignation, but that's hardly the End of the World As We Know It. Heck, it's not even /r/Python-worthy news – and they'll accept anything there!

I was able to find more articles on Python's garbage collection than I ever expected to.

heh

</face_twitch>

Catching them is pretty easy we just need to add an additional check...

Oh, my vigorous and innocent and easily deceived Padawan. To be young again! Seriously, you're awesome. The above statement, however, is not.

The set of the types of all builtin C-based callables is larger than just types.BuiltinFunctionTypesignificantly larger. Python internally differentiates between an excrutiating number of different builtin C-based callables. I've seen bound method wrappers, unbound class method descriptors, unbound dunder method wrapper descriptors, and unbound non-dunder method descriptors the horrifying likes of which I pray you never personally witness.

This is why the beartype.cave.CallableTypes tuple exists. I suspect you didn't know about that thing, because that thing is currently undocumented outside that module (which is bad). If we really need to differentiate C-based callables from non-C-based callables at some future point, it'd be trivial to define a new beartype.cave.CallableCTypes tuple reducing to just the set of the types of all builtin C-based callables.

Phew.

Well, that escalated quick.

Heliotrop3 commented 3 years ago

Note: inspect.getfile is heavily referenced throughout. It does the heavy lifting of inspect.getsourcefile

Excellent, my dark acolyte. Excellent. Let us cut out the perfidious middle man! With respect to get_callable_filename_or_placeholder(), we only care about inspect.getfile(). Good catch, it is.

....

Note: Functions/Methods aren't the only objects containing co_filename. inspect.getfile includes Frame and Traceback objects >>which beartype would omit as they are not callable.

Right. Super catch, Master Helmsman Heliotrope!

Do we want to use inspect.getfile()? It performs 3 checks which won't make it past callable():

The way inspect.getfile() is built means we'd always have to make those checks even though they would never be true. Likewise, inspect.getfile() uses hasattr() to check a Class's __module__. hasattr() uses EAP internally.

Whats a false-positive? According to inspect.getfile, function/method objects lacking __code__.co_filename and classes whose __module__ has a __file__ evaluate to None.

Just to be unambiguous, but those edge cases aren't actually false positives with respect to whether or not those objects are callable; those objects are still callable, so there's no false positive. They're just in-memory rather than on-disk, which is absolutely okay and rather common in the dynamic hellscape of interpreted Python.

Hmmm, so those are the cases where we'd want to return "\<string>"?

Heliotrop3 commented 3 years ago

@leycec, could we tag this issue as a low priority? I would like to get this implemented but as you say:

The Forest Rather Than the Trees View™ of this whole thing is just to generate human-readable tracebacks – a very non-mission-critical feature.

At some point this might get bumped but for now, correct me if I'm wrong, but placing this on the back-burner seems appropriate.

If helpful, I can submit a pull request with the progress I made on my implementation so we, or whoever else, isn't starting from scratch. (Assuming it's accepted of course).

leycec commented 3 years ago

Do we want to use inspect.getfile()?

That's a defly nope. The inspect module is horrible! Okay, it's amazing, because someone already did all the hard work for us for free. I'm a big fan of that, being intrinsically lazy by genetic design.

But it's exactly as you've diagnosed: inspect is sooo friggin' slow, thanks in large part to unnecessarily coercing everything and more into an abstract object-oriented representation that really no one asked for. You will be unsurprised to learn I'm brutally ripping all of the remaining inspect calls out of beartype and would prefer not adding any more on pain of my own personal crucifixion.

Hmmm, so those are the cases where we'd want to return ""?

You are the brilliance I seek in the world. In other words, yes.

...could we tag this issue as a low priority?

This is an appropriate idea. So it has been spoken, so it shall be.

If helpful, I can submit a pull request with the progress I made on my implementation so we, or whoever else, isn't starting from scratch.

This is an excruciatingly good idea. Let nothing that is golden go wasted.

...when you find the time, on your initiative, at your leisure, as you please. Your time is valuable. Thanks yet again for all the deep mysticism, Tyler. You keep me on my toes like a pirouetting code ballerina. :ballet_shoes:

leycec commented 3 years ago

Ugh! GitHub's medieval issue tracker (seemingly unearthed from an ancient steampunk civilization) has no built-in support for prioritizing or sorting issues. Instead, I just made a new later label.

Welcome to 2021: The Exciting World of the Future, Today.

Heliotrop3 commented 3 years ago

...when you find the time, on your initiative, at your leisure, as you please. Your time is valuable. Thanks yet again for all the deep mysticism, Tyler. You keep me on my toes like a pirouetting code ballerina. 🩰

Float like a butterfly, sting like bee - We're all about ruthless efficiency. I don't know what the future holds for beartype, but its growth inducing, interesting, and the feedback isn't laced with condescension - don't really know what more one could want from a project. Likewise, the research you've already done is a great springboard. I mean, really, I'm basically taking your notes, researching, and supplying my findings, thoughts, and potentially face-palm inducing questions. All in the pursuit of understanding how to translate the design needs into discussion points and eventually into code.

If helpful, I can submit a pull request with the progress I made on my implementation so we, or whoever else, isn't starting from scratch.

This is an excruciatingly good idea. Let nothing that is golden go wasted.

I'll aim to get this taken care of by the coming Sunday.

leycec commented 3 years ago

I don't know what the future holds for beartype...

Please Odin let it be money.

...but its growth inducing...

I couldn't help myself.

...and the feedback isn't laced with condescension...

Please repeatedly punch my avatar if I ever do that.

It angers me deep inside that this may have happened to you before, though. If that ever happens again on GitHub, summon me with a Level 9 Summoning Sigil (okay, so it's just @leycec) and I'll pathetically poke at them my acidic forked tongue, penetrating evil eye, and feeble but sarcastic witticisms.

...and supplying my findings, thoughts, and potentially face-palm inducing questions.

Bro. You defly go beyond that humble and pithy self-synopsis. You routinely send up peer-reviewed doctoral theses on esoteric edge-case behaviour that impress me even as they frighten me. If you're not the Senior Software Architect of the next best Bay Area Startup by 2027, I shall eat a favourite Australian Tilley hat without ketchup on a live Twitch stream.

Please make that happen so I don't have to follow up. Also... been there, done that. The money's great. The hours will kill your very immortal soul and make you regret your decision to become human. Do it for five years, burn out like everyone else, and then become a grizzled Canadian living in an uninsulated cottage down by the river. :+1:

I'll aim to get this taken care of by the coming Sunday.

Bro. You take time, please. You play board games. You binge Season 3 of Stargate: SG-1. Then get back to us. I'm sending up the next stable release of beartype tonight, so there's no rush. The next stable release is months away. We have all the time in the world.

Unless we don't. But let's pretend we do.