SwiftDocOrg / CommonMark

Create, parse, and render Markdown text according to the CommonMark specification
MIT License
179 stars 10 forks source link

Added support for DOM walking using a visitor #13

Closed regexident closed 3 years ago

regexident commented 4 years ago

While accessing certain nodes in a document via node.children works alright for shallow document structures it becomes cumbersome as soon as you are interested in some more interesting queries such as "collect all links, from anywhere in the document".

(Writing a visitor pattern in time of a pandemic might not have been the most responsible thing, but I did wear a :mask:, I promise.)

regexident commented 4 years ago

The reason all func visit(…: …) methods on protocol Visitor return -> VisitorContinueKind, regardless of whether the passed-in type actually has children to visit, is to make the API more resilient against future changes.

mattt commented 4 years ago

@regexident What a great addition to this library! I was in such a rush to jump to HTML-rendered output and XPath queries for swift-doc, that I neglected to appreciate the beauty of the CommonMark AST for what it was. Really inspiring stuff, and I'm a big fan of the API naming consistency with SwiftSyntax.

Is there anything left to do before this is ready for review?

regexident commented 4 years ago

I made it into a draft for now as I wanted to try getting support for subtype/override semantics, which needed me to change some stuff.

This would allow for visiting super-types, which can be super useful:

func visit(node: Node) -> …
func visit(block: Block) -> …
func visit(containerBlock: ContainerBlock) -> …
func visit(leafBlock: LeafBlock) -> …
func visit(inline: Inline) -> …

But with that one would either have to gave these higher-level visit methods have no say in controlling the walk via VisitorContinueKind, or extend VisitorContinueKind with an case .inherit and implement some kind of overriding.

I am currently working on the latter as that would further more allow for really nice (i.e. need for very little boiler-plate when implementing Visitor) support for building visitors that, e.g. with the exception of Paragraph skip all Blocks:

final class TestVisitor: Visitor {
    var numberOfTexts: Int = 0

    func visit(block: Block) -> VisitorContinueKind {
        return .skipChildren
    }

    func visit(paragraph: Paragraph) -> VisitorContinueKind {
        return .visitChildren
    }

    func visit(text: Text) -> VisitorContinueKind {
        self.numberOfTexts += 1

        return .visitChildren
    }
}

protocol Visitor's default implementations of func visit(…:) would default to .inherit, instead of .visitChildren, with self.defaultContinueKind being the implicit root.

The order of precedence would be (a > b meaning a can override b):

T > Inline > LeafBlock > ContainerClock > Block > Node

Inline, LeafBlock and ContainerClock should be disjoint sets, but given that we cannot guarantee this from a type-system perspective I defined an order that should reflect the nesting in a document, akin to "cascading" style sheets.

All of this complexity should remain fairly hidden in the package and require minimal configuration.

mattt commented 4 years ago

@regexident Just a quick heads-up: We're in the process of changing how containers are implemented (#11). I'll try to integrate that in the next day or two so that it doesn't block your work here.

regexident commented 4 years ago

We might also want to have at least a generic "end" event for Visitor:

public protocol Visitor {
    // ...
    func didEndVisit()
}

So that a Visitor could maintain an internal stack during traversal such as for inserting closing tags while building HTML from a document.

mattt commented 4 years ago

We might also want to have least add a generic "end" event for Visitor [...] so that a Visitor could maintain an internal stack during traversal such as for inserting closing tags while building HTML from a document.

The way this is done in SwiftSyntax is with visitPost(_:) complements to visit(_:). So the last method that a visitor calls would be visitPost(_:) for the Document.

regexident commented 4 years ago

Yeah. My only concern would be that it would double the API-surface.

My thought was that if you needed info about the node post visit you could just keep a stack of nodes. That would certainly not be as efficient as having visitPost(…:) , but due to nodes being classes should be reasonably efficient memory-wise.

That being said visitPost(…:) would certainly be cleaner.

Anyway, I would be fine with either. I'll happily leave that decision up to you. 🙂

regexident commented 4 years ago

Done.

Lemme know what kind of visitPost(…:) API you would prefer and I'll quickly add it. 🙂

Also, rather than merge now I'd say let's wait for https://github.com/SwiftDocOrg/CommonMark/issues/11 first, and I'll rebase once that's been merged.

mattt commented 4 years ago

Lemme know what kind of visitPost(…:) API you would prefer and I'll quickly add it. 🙂

While it would offer a smaller API surface area, I think using a single method for all "did visit" delegate methods has a strong likelihood of causing confusion. For example, a user may mistakenly see visit for text elements as a specialization for visit for inline elements, and expect only one of them to be called; so one "did visit" callback instead of two.

Given the strong example set by SwiftSyntax and the clarity of their adopted approach, I think visitPost is a better option — in spite of its API surface area.

If you're particularly concerned about API surface area for consumers, one option might be to borrow the approach used by SwiftSyntax with its AnyVisitor class

Also, rather than merge now I'd say let's wait for #11 first, and I'll rebase once that's been merged.

Agreed!

regexident commented 4 years ago

I just added the func visitPosts. 🙂

regexident commented 4 years ago

With https://github.com/SwiftDocOrg/CommonMark/pull/14 closed (🙁) this PR should be ready to merge now.

mattt commented 4 years ago

@regexident Thanks for the reminder. Working to get this merged now!

regexident commented 4 years ago

Should be resolved. 🙂

gonzalonunez commented 4 years ago

Hi!

I'd love to try this, is there anything else needed to get this merged? Happy to be of any help that I can.

@mattt I don't expect you to remember 😀, but we met briefly over dinner with some folks when you gave a presentation at Airbnb! I'm now at Primer (where we almost crossed paths I believe!), and the Education team writes projects in Markdown.

I'm exploring using CommonMark to traverse the AST (for some custom components) + render these projects natively in a UICollectionView.

regexident commented 3 years ago

Hey @mattt, is there anything else needed to get this PR merged?

regexident commented 3 years ago

Hate to bother you, @mattt, but any chance to get this merged at some point? (I have two projects where this would be tremendously helpful, but I'd rather not maintain a fork.)

mattt commented 3 years ago

@regexident Thank you for the final nudge on this and your extraordinary patience in waiting for this to get merged. I'll cut a new minor release with this (and your wonderful memory leak fixes) ASAP.

Edit: This is now live in 0.5.0

regexident commented 3 years ago

Thank you so much @mattt!