MichaelHatherly / CommonMark.jl

A CommonMark-compliant Markdown parser for Julia.
Other
84 stars 11 forks source link

RFC: implement invalid reference link callbacks #48

Open mortenpi opened 2 years ago

mortenpi commented 2 years ago

This is one possible implementation of user-provided callbacks for invalid reference links (https://github.com/MichaelHatherly/CommonMark.jl/issues/40#issuecomment-1085406385). For now, I decided to go with a Parser-level option. But rather than implementing a new field, I decided to integrate this with the existing .refmap, to minimize the code changes. But this is more of a PoC and I think the API should be improved (e.g. adding/removing callbacks means interacting with an internal field, which I think should avoided).

However, what I really wanted to do was to implement this as an option to LinkRule (and ImageRule). But I couldn't figure out how to do that without re-engineering a lot of the general code. Basically, I would imagine you can just replace the default LinkRule with LinkRule(mycallback) and then mycallback(s) would get called somehow.

But, when the parsing functions get called (parse_close_bracket in particular), it does not have access to the actual LinkRule object. I thought about passing it via a closure, but my impression is that these parse_* functions should not carry state (e.g. there will be confusion if ImageRule and LinkRule both try to add a function with different state, since they both trigger on the same character). I also thought about just looping over p.rules and finding the corresponding rule object, but parse_close_bracket does not have access to the Parser object (just InlineParser).

The next iteration was to then have a way of declaring that a LinkRule can "register" some sort of new hooks (I don't think the existing *_rule / *_modifier functions could be re-used for this) when you enable! it. But that still requires support on the Parser object level, so I felt it's not too different from the approach I took right now.

I do feel that doing this on the rule level would be better than on the parser level. But it feels like that would require some more general changes (e.g. expanding the call signature of the parse_* functions). And so I am a little unsure what the best approach would be for this.

MichaelHatherly commented 2 years ago

should not carry state

Agreed, stateless if at all possible otherwise we'll just make things too confusing in the long-term.

I do feel that doing this on the rule level would be better than on the parser level. But it feels like that would require some more general changes

Yes, that would be the right long-term approach to this.

mortenpi commented 1 year ago

Any thoughts on how to approach it though? I don't see any obvious way to pass the LinkRule, or some of its fields, to the parse_close_bracket function. Right now, I am starting to lean towards having a way to pass the LinkRule object along in the Rule object, store it in the .inline_parsers field by making it ::Dict{Char, Vector{Tuple{Function,Any}}}, and then pass it to the parse_* function when it gets called as the third argument.

In principle, this would work in general for any rule then, just it would mostly be ignored, and the same thing could be implemented for modifiers and for the block parsers. Not entirely sure about the performance implications, but I could try benchmarking it. It feels like the parse_inline function should be anyway type-unstable because of the generic ::Function objects, so adding another type-unstable variable maybe won't have much of an effect. That said, things basically become stateful again this way.. just not a closure.

Another issue is still that both LinkRule and ImageRule reuse the same parse_* functions, and so they would override each others state. But, as far as I can tell, only these rules do that. Maybe this should be disallowed anyway (i.e. throw an error if this is false)? This would imply combining LinkRule and ImageRule into a single rule though.

mortenpi commented 1 year ago

I noticed that there is actually some precedent to having the parser functions be stateful closure, with the ji from JuliaInterpolationRule being used in the parser function:

https://github.com/MichaelHatherly/CommonMark.jl/blob/7e6138c82ba784ac349829d9635faf4267fea479/src/extensions/interpolation.jl#L24-L44

MichaelHatherly commented 1 year ago

I noticed that there is actually some precedent to having the parser functions be stateful closure, with the ji from JuliaInterpolationRule being used in the parser function:

Yeah, there's that one. If it makes for a cleaner implementation then feel free to take a similar approach to that.