Splitting huge cells in nbs

DeanLight commented 1 year ago

Some of the cells in the nbs (ie engine in nb 07) are huuuuuuuuge and hard to read/debug because of this.

To fix this we can use fastcore's patch decorator https://stackoverflow.com/questions/73316102/fastai-fastcore-patch-decorator-vs-simple-monkey-patching

loayshaqir1 commented 1 year ago

Fastcore's @patch decorator is a little bit problematic when working with abstract classes, I'll show you a problem and suggest two solutions you can choose one or suggest another solution. consider the following code:

from abc import ABC, abstractmethod
from fastcore.basics import patch

class A(ABC):
    @abstractmethod
    def foo(self):
        pass

class B(A):
    def __init__(self):
        pass

@patch
def foo(self : A):
    print("foo patched to A")

if __name__ == '__main__':
    a = B()

Executing the code yields this error:

TypeError: Can't instantiate abstract class B with abstract methods foo

The reason for that problem is that in Python each class that is inheriting from an abstract class he starts with an attribute called __abstractmethods__ which is a frozenset that contains a set of functions that should be implemented by the child class, this attribute gets updated only in the scope of the class definition and at the end the set should be empty i.e the child class has implemented all the abstractmethods, since we are using patch to separate functions into different cells, Python doesn't see that we actually implemented the required functions.

Possible solutions: 1- "define the functions with pass" just to show the interpreter that we actually implemented the functions and then override the implementation by using patch, the above code will look like this:

class B(A):
    def __init__(self):
        pass
    def foo(self):
        pass

@patch
def foo(self: B):
    print("foo patched to B")

2- We can build our own decorator that does exactly what should've been done based on the above explanation:

class A(ABC):
    @abstractmethod
    def foo(self):
        pass

class B(A):
    def __init__(self):
        pass

def spanner_patch(func):
    cls = next(iter(func.__annotations__.values()))
    # Convert the frozen set to a regular set to be able to edit
    abstracts_needed = set(cls.__abstractmethods__)
    abstracts_needed.discard(func.__name__)
    cls.__abstractmethods__ = abstracts_needed
    setattr(cls, func.__name__, func)
    return func

@spanner_patch
def foo(self: B):
    print("foo patched to B")

DeanLight commented 1 year ago

In your example, you tried to patch the function to A, not to B. I assume that was a typo?

if so and this bug still works. Lets go with option 2, Just make sure you are supporting all of the stuff that patch supports. It might be easier to look at their implementation and copy it and change what you need to make it work. https://github.com/fastai/fastcore/blob/master/fastcore/basics.py#L938 And you can open an issue in fastcore on this problem with your proposed solution. Just dont call the function spanner_patch just cause this is the spanner workbench, just call it patch.

If you see that this is too much, you can also try to factor out enough logic out of the class into helper functions, but i worry that this solution too might be more work than neccessary, so you'll have to see what makes more sense.

loayshaqir1 commented 1 year ago

Alright, so we went with option 2. regarding supporting all of the stuff that patch supports, it wasn't that easy to go to their code and copy the function because it used many functions of their own which will lead us to actually copy a lot more functions than necessary. I think that the easiest way was to just call the patch decorator after we do "our job" of taking care of the abstractmethods issue. so now our decorator looks like this:

#| export
def wrapped_patch(func : Callable, *args, **kwargs) -> Callable:
    """
    Applies fastcore's `patch` decorator and removes `func` from `cls.__abstractsmethods__` in case <br>
    `func` is an `abstractmethods`
    """
    cls = next(iter(get_type_hints(func).values()))
    abstracts_needed = set(cls.__abstractmethods__)
    abstracts_needed.discard(func.__name__)
    cls.__abstractmethods__ = abstracts_needed

    # Apply the original `patch` decorator
    patch(*args, **kwargs)(func)

In case you would ask why the decorator doesn't return a function, that is because the decorator gets called only once, unlike the usual behavior of decorators, once the decorator is called func is patched to the class and it gets called from within an instance of the class, and that's the ideal behavior for us as we wouldn't want that the decorator get called each time as one time is enough for us.

But now we have another problem popped up that i would like to suggest a solution to and you tell me if you agree: Lets have the engine code for example, for better documentation we embedded example cells in between the code cells, It looked like that:

SqliteEngine (Huge cell containing all the functions)
show_doc(SqliteEngine.some_func)
Example # Showing how to use `some_func`

that way it was so easy for the user to see the example right below the documentation, but now after patching i assume that you expect me to call show_doc right below each code cell implementing some function and then an example, but we can't really do that as all of the engine examples need an instance of the class in order to run, and we can't get that instance as we haven't patched all of the functions yet. My suggested solution is to first patch all the functions, and then treat them exactly as we did before. which will make the notebook look like this:

SqliteEngine (Small cell containing only static methods and necessaries)
patched_func1
patched_func2
...
show_doc(SqliteEngine.patched_func1)
example # Showing how to use `patched_func1`
show_doc(SqliteEngine.patched_func2)
example # Showing how to use `patched_func2`

with that solution it is still the same user experience we had before from the documentation side, but the only problem is that it creates a gap between the code of the patched function cell, and calling show_doc on that function. What do you think?

DeanLight commented 1 year ago

I think your solution is good. The tradeoff is true of course but its good enough and there is no good way to avoid it.

The fact that its not a perfect solution is not an issue since it just make more visible the cost of having too much responsibility for a single class / lack of intermediate abstraction layers.

DeanLight / spannerlib

Splitting huge cells in nbs #99