MichaelHatherly / CommonMark.jl

A CommonMark-compliant Markdown parser for Julia.
Other
87 stars 11 forks source link

HTML injection: block vs inline #11

Closed tlienart closed 3 years ago

tlienart commented 3 years ago

Hello!

I'm finally about to do some actual implementations to replace Franklin's use of Julia's Markdown module by CommonMark.jl. In doing so, I'm also considering refactoring a bit how Franklin produces stuff (details here but probably not relevant for my question). Overall the gist would be to have the following workflow:

input (md) --> Franklin --> inter1 (md) --> CommonMark --> inter2 (html) --> Franklin --> output (html)

the relevant part for the question is the first one: taking "Franklin"-Markdown and converting it into CommonMark markdow (note that, in case you're wondering, I don't think the whole lot can be done in CommonMark as some parsing patterns are quite different). Franklin could basically read what it considers "special-blocks" from the input, convert those into HTML itself, then form a CommonMark text with those weaved in, so basically inter1 (md) would look like

Some standard markdown 

```{=html}
a special block converted into html by Franklin

Some more markdown



which CommonMark.jl could then parse and generate HTML for with the `RawContentRule()`

## Question

(apologies for the long intro)

> in CM.jl there's a distinction between **inline**-html and **block**-html; and, as far as I can tell you can only use inline HTML in nested environments, things like list items or table cells; on the other hand inline-html cannot have `\n` in it

this is problematic for me because I cannot generally assume that the resolution of a "special block" by Franklin will be single or multi-line. 
Could there be a way to not have this distinction? (possibly build a specific rule?) I think what I want is to be able to just inject blocks of raw html anywhere (including in list items, table cells, ...) without assuming that there will be a single or multiple lines.

Thanks!
MichaelHatherly commented 3 years ago

Could there be a way to not have this distinction? (possibly build a specific rule?) I think what I want is to be able to just inject blocks of raw html anywhere (including in list items, table cells, ...) without assuming that there will be a single or multiple lines.

Are you needing a feature that allows inserting a new block element inside of a series of inline elements? Once you're writing inlines I'd always expect the resulting output to be inline, rather than expanding to a new block that might impact surrounding blocks?

If you've go a concrete example of where you use this in Franklin syntax that would be helpful for me to understand this better?

tlienart commented 3 years ago

Ok sure, here are two toy examples of "special blocks"

  1. @@name ...@@ inserts a div with class name
  2. \style{color:red}{hello} inserts a span with given css

If we just reason about those two, the first one would be a block and the second one an in-line element.

However in general there are constructs in Franklin which could be one or the other depending on how the user defines them, this is why I'd hope for a single way to plug in raw html. I understand that in some case that might be inappropriate (eg a div inside a li) but forcing the user to specify whether the use is inline or block would be cumbersome in Franklin (though if it's the only way, I'll try to figure that out within Franklin)

tlienart commented 3 years ago

The behaviour of the block html in CM.jl is almost what I want; I just noted that it doesn't work properly if put in a list item for instance eg

* A
* B ```{=html}C```
* D
MichaelHatherly commented 3 years ago

OK, that makes sense. I had a play around with fd2html

julia> fd2html("One @@two three@@ four.")
"<p>One </p>\n<div class=\"two\">three</div>\n<p>four.</p>\n"

So @@s will interrupt the parent block and split it if it introduces a block level element inside of another.

If may be possible to convince an inline_rule and inline_modifier in to acting in this way. There's currently no rules that match that behaviour, but may be worth trying to get that working. I'm fine with doing some minor architecture changes if need be, but the main parsing algorithm needs to remain as close to the reference as possible for the sake of maintainability.

tlienart commented 3 years ago

Yes exactly. Thanks a lot, let me know if it's doable, otherwise I'll have to try to go another route :)

MichaelHatherly commented 3 years ago

Yeah, it seems like it's doable to expand blocks from inlines, though it will probably be a little tricky to get it working 100%. There doesn't seem to need to be any internal changes to allow it.

tlienart commented 3 years ago

Hey Mike, if I can help with this somehow, let me know, I'll be happy to contribute a PR if I can as this would likely improve the HTML generation in Franklin in the long run :)

MichaelHatherly commented 3 years ago

I'm not likely to be able to carve out enough time to work on this myself at the moment, so if you do have the time to implement something that would be great. I'm fine with it living in this package rather than externally. Happy to review PRs.

tlienart commented 3 years ago

Could you give me some pointers on what part of the code to edit? I'll try to figure the rest out thanks!

MichaelHatherly commented 3 years ago

extensions/tables.jl (specifically the inline_modifier definition) may be the most useful example of doing the kind of transformation this would need, though it'll be a bit different to it.

My high-level overview of how the parsing might work is: during inline parsing identify the start/end syntax (@@name or whatever), then during the inline_modifier (which runs after each block is parsed for inline elements) go over the captured @@s and re-organise the block's AST, possibly splitting the block into several different blocks depending on the "name" associated with @@. I don't think there should be any need to modify the basic parsing architecture to do this.

tlienart commented 3 years ago

Hmm I think AST stuff will not be flexible enough for what I want to do (Franklin is probably too permissive and CM too restrictive, though I understand why). Note the @@ stuff is just an example, there's quite a bit more to it.

I was hoping to be able to have something like ```{=rawhtml}abc``` with the understanding that CM.jl would not try to parse abc and just leave it as is without doing anything even if it leads to improper HTML; for instance:

Some text ```{rawhtml}<span style="color:red">foo</span>``` etc

* A
* B ```{=rawhtml}<div class="foo">hello</div>```
* C

-->

<p>Some text <span style="color:red">foo</span> etc
<ul>
<li> A </li>
<li> B <div class="foo">hello</div> </li>
<li> C </li>
</ul>

and even worse

Some text ```{rawhtml}</p>```

-->
<p>Some text </p></p>

So in short I'd like 3 modes of HTML input instead of two:

If this is impossible in CM.jl without significant changes (or unmaintainable changes), I'll probably have to write this stuff myself.

Thanks and sorry if I wasn't clear in the first place!

Note: actually I think I can do this irrelevant of whether CM supports it or not by just plugging in HTML at a later stage...

MichaelHatherly commented 3 years ago

I was hoping to be able to have something like ```{=rawhtml}abc``` with the understanding that CM.jl would not try to parse abc and just leave it as is without doing anything even if it leads to improper HTML; for instance:

Inline raw elements might be suitable if there's no need to do anything to the contents if that's the case:

julia> text =
       """
       Some text `<span style="color:red">foo</span>`{=html} etc

       * A
       * B `<div class="foo">hello</div>`{=html}
       * C
       """;

julia> p = enable!(Parser(), RawContentRule());

julia> html(stdout, p(text))
<p>Some text <span style="color:red">foo</span> etc</p>
<ul>
<li>A</li>
<li>B <div class="foo">hello</div></li>
<li>C</li>
</ul>
tlienart commented 3 years ago

Cool! Thanks Mike, I'll mull this over and try to piece something together that makes sense most of the time 😁

MichaelHatherly commented 3 years ago

No worries, give me a shout if you need to bounce any ideas around.