chalk-diagrams / chalk

A declarative drawing API in Python
MIT License
282 stars 13 forks source link

Factor out into files #92

Closed srush closed 2 years ago

srush commented 2 years ago

This is a proposal for https://github.com/danoneata/chalk/issues/89 .

I tried a couple different options, but I am still not a master at the python type system.

upsides ->

danoneata commented 2 years ago

Thanks @srush! I'll need to think a bit about the changes. Most likely, this is the right direction, but I got somewhat confused by the Diagram -- DiagramBase distinction: Diagram plays more the role of a "base" (abstract) class, but I guess you've preferred the shorter name for it since it's used as a type throughout the code base?

As a further step, I've tried to see how the refactoring of the backend would look like with the Visitor pattern (I think the other fold-like functions on the Diagram AST, such as get_envelope or get_trace can be refactored similarly). The code is on this branch, see the chalk/backend folder. While I also find the pattern a bit convoluted, it does help with decoupling the functionality. it remains to be seen how much of the type safety we lose when undergoing such a change (I haven't yet attempted to make mypy happy). I'd be curious to know what is your feel towards such a restructuring.

You mention as a downside:

doesn't solve the problem of constructing diagrams.

I'm not sure I see what you mean; would you mind expanding on this point?

srush commented 2 years ago

the Diagram -- DiagramBase distinction: Diagram plays more the role of a "base" (abstract) class, but I guess you've preferred the shorter name for it since it's used as a type throughout the code base?

So I think I disagree, although these terms are muddied in python. Diagram is a protocol which seem most similar to a Java Interface, (Python calls it static duck typing). Where DiagramBase feels like a Java abstract base class to me since it has several unimplemented functions (get_envelope, etc.) so you shouldn't instantiate it, but its children.

That being said, it shouldn't be necessary to have to create a Diagram interface, since all diagram inherit from DiagramBase. Ideally that would be the main type everywhere. But I couldn't figure out a way to have the functions be in different files without declaring that interface. Practically, Diagram and BaseDiagram shouldn't be different.

Most likely, this is the right direction

Another option would be to have a "UserDiagram" with all these user methods. And then have it hold a "Diagram" which is implemented with just the core class hierarchy. It's a little weird that core functions like Compose are mixed in with user functions like show_origin.

As a further step, I've tried to see how the refactoring of the backend would look like with the Visitor pattern (I think the other fold-like functions on the Diagram AST, such as get_envelope or get_trace can be refactored similarly).

I like this a lot! This definitely seems right.

doesn't solve the problem of constructing diagrams. I'm not sure I see what you mean; would you mind expanding on this point?

Since Diagram is an interface/protocol, it can't have constructors. So you have to call them Primitive.from_shape. This is fine, but it means that you have to import Primitive into some of the factored out classes.

danoneata commented 2 years ago

So I think I disagree, although these terms are muddied in python. Diagram is a protocol which seem most similar to a Java Interface, (Python calls it static duck typing). Where DiagramBase feels like a Java abstract base class to me since it has several unimplemented functions (get_envelope, etc.) so you shouldn't instantiate it, but its children.

I see! Thank you for the explanations! I do agree with you; I might have just expected Diagram to have been named DiagramProtocol or Diagramable or something along these lines (but for brevity Diagram is fine). And maybe part of my initial confusion stemmed from the fact that Diagram and DiagramBase shouldn't in principle have been distinct (as you also said), but it was necessary to make the imports work.

More to the point, I think __init__.py could also be refactored in the light of the last changes: some functions are now redundant (e.g., beside, atop could be imported from chalk.juxtapose) and some could also be moved in the corresponding files (e.g., rectangle to shape.py, cat to juxtapose.py; should we rename juxtapose.py to combinators.py, to align with diagrams?)

I like this a lot! This definitely seems right.

Okay, cool! In the meantime, I've continued with the refactoring of the envelope, trace and subdiagram functionality on the other branch.

Another option would be to have a "UserDiagram" with all these user methods. And then have it hold a "Diagram" which is implemented with just the core class hierarchy. It's a little weird that core functions like Compose are mixed in with user functions like show_origin.

Aha, interesting, althought not exactly sure I can visualize how the code will look under this alternative approach :slightly_smiling_face: Just to make sure: have you already tried this variant and prefer the current one, right?

srush commented 2 years ago

I agree. I will re-sort functions into files to match closely the structure of Diagrams, and remove functions from init that can be removed. I will keep the directory structure flat though, and not seperate 2d and 3d (since I don't think 3d is in our plans).

Besides that, I think maybe we can merge this in. It doesn't get in the way of your branch right?

danoneata commented 2 years ago

Yes, I think we can merge this. I don't have a better solution than the one you proposed and the merging doesn't affect my branch.

I assume the alternative approach that you mentioned would look something along these lines (see code below)? There would be a user-facing PublicDiagram (in the __init__.py), which holds an internal private Diagram. The issue being that there is a lot of wrapping and unwrapping of types going on in the PublicDiagram and probably also duplication of the documentation (e.g., the beside function should probably be documented in both __init__.py and combinators.py)?

# types.py
from dataclasses import dataclass

class Diagram:
    pass

class Empty(Diagram):
    pass

@dataclass
class Compose(Diagram):
    d1: Diagram
    d2: Diagram

# combinators.py
from chalk.types import Diagram
from chalk.envelope import get_envelope

def beside(d1: Diagram, d2: Diagram) -> Diagram:
    # uses get_envelope(d1)

# envelope.py
from chalk.types import Diagram, Empty, Compose

class Envelope:
    pass

def get_envelope(d: Diagram) -> Envelope:
    if isinstance(d, Empty):
        pass
    elif isinstance(d, Compose):
        pass

# __init__.py
from dataclasses import dataclass
from chalk.types import Diagram, Empty
import chalk.combinators

@dataclass
class PublicDiagram(Diagram):
    diagram: Diagram

    def beside(self: PublicDiagram, other: PublicDiagram) -> PublicDiagram:
        return PublicDiagram(chalk.combinators.beside(self.diagram, other.diagram))

def empty() -> PublicDiagram:
    return PublicDiagram(Empty())
srush commented 2 years ago

Yes, that is exactly what I was thinking. I think this abstraction is nicer. But I am not sure it is worth it. Either way, it still requires the the diagram protocol, so we could switch to it later. (I think there are only a couple core diagram functions, compose, transform, etc that would need to be repeated)

One other todo. The thing I like best about your visitor change is the potential to split out the backend renderers. But it would be nice to do that for primitive Shapes as well. But maybe we can reduce the number of shapes first. If paths/trails have arc_betweens then Arc, Circle, Rect, etc all go away. (This seems easier than implementing full bezier curves, which is non trivial and not that useful).

danoneata commented 2 years ago

But it would be nice to do that for primitive Shapes as well.

You mean to move the render, render_svg, render_tikz logic of each shape to the corresponding backend files, right? I think that could be done, yes.

But maybe we can reduce the number of shapes first. If paths/trails have arc_betweens then Arc, Circle, Rect, etc all go away. (This seems easier than implementing full bezier curves, which is non trivial and not that useful).

Yes, that makes, sense!