Closed tlienart closed 3 years ago
be able to ignore the conversion of indented blocks
The spec does bake-in a little bit with regards to indentation as far as I know, but on trying to disable indented code it doesn't appear to break much. So might be possible to achieve that fairly easily.
julia> p = CommonMark.Parser();
julia> text =
"""
indented
"""
" indented\n"
julia> ast = CommonMark.parse(p, text)
Node(CommonMark.Document)
julia> CommonMark.ast_dump(ast)
Document
CodeBlock
"indented\n"
julia> pop!(p.block_starts)
indented_code_block (generic function with 1 method)
julia> ast = CommonMark.parse(p, text)
Node(CommonMark.Document)
julia> CommonMark.ast_dump(ast)
Document
Paragraph
Text
"indented"
Whether disabling it causes odd behaviour elsewhere I couldn't say for certain just yet. Would be worth trying out though.
I would absolutely love to get rid of indented code blocks completely from the syntax, but alas that's never going to be a realistic goal, it makes parsing quite a bit more complex than it needs to be. https://johnmacfarlane.net/beyond-markdown.html was good reading while putting this package together.
(minor thing)
{{...}}
are converted by the Markdown module into html entities
Those aren't converted. I thought for a moment that I might have implemented that wrong, but babelmark doesn't either if I'm not mistaken.
If those are the things you'd need to get going then you might have some success, hopefully!
Thanks for the code, awesome I look forward to trying this out over the weekend :)
Sorry for briefly re-opening this, the trick you suggested does not work anymore but this:
CommonMark.indented_code_block(::CommonMark.Parser, ::CommonMark.Node) = 0
seems to work fine. It preserves base markdown formatting that happens in the indentation for instance:
julia> p = CommonMark.Parser();
julia> t = """
ab **and**
cd
ef **gh**
foo `bb`
bar
end"""
"ab **and**\n\n cd\n ef **gh**\n foo `bb`\n bar\n\nend"
julia> CommonMark.ast_dump(p(t))
Document
Paragraph
Text
"ab "
Strong
Text
"and"
Paragraph
Text
"cd"
SoftBreak
""
Text
"ef "
Strong
Text
"gh"
SoftBreak
""
Text
"foo "
Code
"bb"
SoftBreak
""
Text
"bar"
Paragraph
Text
"end"
apart from the fact that it's a dirty trick, do you see anything that could go terribly wrong from the CommonMark.jl perspective if I use this? thanks!
Couldn't say with any certainty whether that will do anything terribly bad, worth trying it out, though parsing CM can be pretty weird at times. You can avoid the type piracy on that method (I assume you're redefining it within the REPL (and Franklin)? by doing something like the following:
struct SkipIndented end
block_rule(::SkipIndented) = Rule((p, c) -> 0, 8, "")
julia> p = enable!(disable!(Parser(), IndentedCodeBlockRule()), SkipIndented())
Parser(Node(CommonMark.Document))
This swaps out the indented block parser with your definition without the piracy. (I was surprised that it didn't actually break anything at all.)
julia> CommonMark.ast_dump(p(t))
Document
Paragraph
Text
"ab "
Strong
Text
"and"
Paragraph
Text
"cd"
SoftBreak
""
Text
"ef "
Strong
Text
"gh"
SoftBreak
""
Text
"foo "
Code
"bb"
SoftBreak
""
Text
"bar"
Paragraph
Text
"end"
This needs a tiny patch to CM, since there's a bug in disable!
that this uncovered (which I'll push tomorrow):
diff --git a/src/parsers/rules.jl b/src/parsers/rules.jl
index a20a582..2626c3e 100644
--- a/src/parsers/rules.jl
+++ b/src/parsers/rules.jl
@@ -50,7 +50,7 @@ function disable!(p::AbstractParser, rules::Union{Tuple, Vector})
empty!(p.inline_parser.inline_parsers)
empty!(p.inline_parser.modifiers)
filter!(f -> f ∉ rules, p.rules)
- return enable!(p, p.rules)
+ return enable!(p, copy(p.rules))
end
disable!(p::AbstractParser, rule) = disable!(p, [rule])
Cool thank you! I'll use your trick
That fix with be tagged shortly: https://github.com/MichaelHatherly/CommonMark.jl/commit/438032d3f338166978e623f865aabb6d1127be9f.
I'll note here that I'd regard what I used to make that trick work can be regarded as public interface and won't be disappearing at random. The previous trick with push!
was definitely "doing bad things".
Hello Michael,
I'm interested in slowly moving over to CommonMark.jl as it's likely to solve a bunch of sneaky issues in Franklin. You mentioned on Discourse you'd be happy to discuss inclusion of extensions that would be relevant to franklin (thanks!). At a high level the parsing works as follows:
The resolution of "plain text" blocks is passed to Julia's Markdown for things like links, lists, ... other blocks that are processed first by Franklin include "command blocks", "div blocks", math blocks, code blocks etc. I don't think it would make a lot of sense to use CommonMark.jl for those yet.
The first obvious step is to use CommonMark.jl for the "level 0" instead of Julia's Markdown. I had already looked into using GithubMarkdown.jl (which wraps around pandoc) for this but there is one key thing I must be able to do: be able to ignore the conversion of indented blocks. (I want users to be able to use indentation for other stuff like making their div blocks clearer). Currently I capture the result of
Markdown.parse
, and hack them back into a paragraph block so that they're not converted. How could I do something like it in with CommonMark.jl?The key small function is https://github.com/tlienart/Franklin.jl/blob/772dde56872b8834bf52ec73c05b4760a1844ddb/src/converter/markdown/utils.jl#L9-L30
which, at a high level, takes the level 0 stuff ("plain markdown") and converts it to HTML. Note that there are two tricks:
{{...}}
are converted by the Markdown module into html entities, I don't want that to happen, is that something that would happen with CommonMark.jl ?Assuming I could do both of these things easily with CommonMark.jl, it should be mostly straightforward to make Franklin use it (and I'd be very happy about it, finally proper lists etc...!)
Thanks and apologies for the long winded message