Closed cldershem closed 6 years ago
It should be fixed in 0.2.2 that was released today but I guess the package managers are not updated yet
I updated to the current release and it still doesn't seem to work as expected. Underscore in the file/shortcode name makes things wonky:
# ./templates/shortcodes/contact_info.html.
{{ label }}
{{ info }}
# ./content/page/contact.md
{{ contact_info(label="this is a label", info="this is the info") }}
{{ contact_info(label="this is another label", info="this is other info") }}
# outputs
{{ contact
{{ contact
this is a label: this is the info
this is another label: this is other info
It's similar with the brackets in the string.
{{ contact_info(label="this is another label", info="this is [other] info") }}
will render:
{{ contact(label="this is another label", info="this is
this is another label: this is [other] info
I might be able to write at least a test case if not give a little more info to troubleshoot with if you point me in the right direction.
That's just the markdown renderer splitting on the markdown characters, there's a test for it but obviously it wasn't enough :(
We'll be addressing this and adding more tests as part of the work being done in #174.
I don't think it's possible to fix this by just changing how events generated by pulldown_cmark::Parser
are handled, because parsing content with pulldown_cmark::Parser
loses information. You can demonstrate this with the following:
extern crate pulldown_cmark;
use pulldown_cmark as cmark;
fn main() {
let markdown1 = r#"{{ foo(bar="`baz`") }}"#;
let parser1 = cmark::Parser::new(markdown1);
let markdown2 = r#"{{ foo(bar="``baz``") }}"#;
let parser2 = cmark::Parser::new(markdown2);
for (ev1, ev2) in parser1.zip(parser2) {
println!("{:>25} == {}", format!("{:?}", ev1), format!("{:?}", ev2));
}
}
which prints
Start(Paragraph) == Start(Paragraph)
Text("{{ foo(bar=\"") == Text("{{ foo(bar=\"")
Start(Code) == Start(Code)
Text("baz") == Text("baz")
End(Code) == End(Code)
Text("\") }}") == Text("\") }}")
End(Paragraph) == End(Paragraph)
Note that the events are exactly the same, even though the bar
argument in the shortcode in markdown1
is the string literal "`baz`"
, while the bar
argument in the shortcode in markdown2
is the string literal "``baz``"
.
I see three ways to correctly handle shortcodes:
If the expression in the shortcode tag was enclosed in a Markdown code span, such as {{` foo(bar="baz") `}}
, this would keep the Markdown parser from parsing the tag, which would make it possible to handle the tag correctly. (Note that if an argument contained a `
character, the shortcode expression would have to be enclosed in extra `
s, e.g. {{`` foo(bar="`baz`") ``}}
.)
For shortcodes with a body, the body would also need to be enclosed in a Markdown code block to prevent it from being parsed by the Markdown parser, e.g.
{%` foo(bar="baz") `%}
body
{%` end `%}
I'm fairly sure this strategy will work, although getting it correct might be difficult, and it's not very pretty.
This strategy is to convert the content to HTML in two stages: content -> parse/render shortcodes -> parse/render Markdown
. This completely bypasses all the issues with trying to handle shortcodes after the content has already been parsed with the Markdown parser. It's fairly straightforward to implement, and there aren't weird edge cases to handle. I have a first-pass implementation here that uses a custom Pest parser to parse shortcodes.
Unlike with the previous approach, if a Markdown code block contains text that looks like a shortcode tag and you don't want it to be rendered as a shortcode, you have to escape it (since the Markdown isn't parsed until after the shortcodes are rendered). To do this, it would be possible to add {% raw %} ... {% endraw %}
blocks like in Tera. An additional way to escape text is to enclose it in a Rust-style raw string. For example, you could write:
{{ r###"
```jinja
{{ this_shortcode_is_escaped() }}
{{ this_shortcode_is_also_escaped() }}
"### }}
which would become
{{ this_shortcode_is_escaped() }}
{{ this_shortcode_is_also_escaped() }}
after rendering shortcodes. (I provided this method in my implementation.)
### 3. Treat the content as a Tera template
This is the same as the previous strategy, except instead of implementing a custom shortcode parser/renderer, you can just use Tera. This approach is nice because it's the simplest to implement, you know that shortcodes (i.e. Tera macros) will behave the same in the content and in templates, and the user has the full power of Tera templates at their disposal when writing their content. Similar to the previous approach, you can escape text that looks like a Tera tag by using `{% raw %} ... {% endraw %}` or Rust-style raw strings (which need to be added to Tera).
Note that currently Tera doesn't have macros that take a body. It would be possible to add macros with a body to Tera, or you could achieve the same functionality with string arguments. For example, you could have the macro
{% macro quote(body, author) %}
{{ body }}
-- {{ author}}
{% endmacro quote %}
and then call it like
{{ macros::quote(author="Vincent", body=r#"This is some text to be placed in a blockquote. It even has "quotation marks" in the body."# }}
What do you think? Personally, my preference is strategy 3.
I like strategy 3 the best as well. But it seems to me like adding body arguments to Tera would also be a nice thing to do to keep the syntax the same.
@reillysiemens and I are currently working on rewriting the markdown parser to clean it up. We can also switch the code to treating the body as a Tera template, which will simplify the reimplementation a lot!
@jturner314 if you have time and would like to, maybe you could implement the new body syntax in Tera? Or we'll add it to our list in our current yak shave :laughing:
But it seems to me like adding body arguments to Tera would also be a nice thing to do to keep the syntax the same.
I agree. I also realized that handling nested shortcodes/macros works a lot better if macros can have a body. (Ideally, shortcode parsing/rendering would be applied to the body of a "shortcode with a body" but not to any string literals passed as arguments.)
if you have time and would like to, maybe you could implement the new body syntax in Tera?
Sure, I can work on that. I'll also add support for Rust-style string literals.
I'm not sure macros with a body make sense in Tera itself, I don't see a usage for it. Not sure about Rust strings as well although I can see the need when you want to have "
in your string but that's a fairly niche case I think. I would be tempted to go with the JS approach of using backticks rather than r#"
though:
{% set message = `Hello "world"` %}
{% set message = r#"Hello "world""# %}
I find the first one more readable and the 2nd is only understandable by Rust devs.
I'm a bit tired but isn't there solution 4?
{% end %}
, render the macro and append raw html to bufferThis way is a bit slower as you go line by line but I don't think it will make a huge difference and you can keep the current syntax.
I'm not sure macros with a body make sense in Tera itself, I don't see a usage for it.
I agree that it's somewhat unusual for most use cases, although Gutenberg shortcodes are one example where this would be useful. Additionally, it's worth noting that Jinja has this feature. (See the first example in the Call section of the docs. The render_dialog
macro is called with the call
block, and it accesses the body of the caller with {{ caller() }}
.)
… I can see the need when you want to have
"
in your string but that's a fairly niche case I think.
Yeah, this is probably somewhat unusual. I came across this when I tried to write a figure
shortcode that generates <figure>
and <img>
tags (with some additional functionality). I'd like to pass the value of the alt
attribute as a literal string argument, and this can contain arbitrary text (including quotation marks).
I would be tempted to go with the JS approach of using backticks rather than
r#"
though … I find the first one more readable and the 2nd is only understandable by Rust devs.
I think the combination of Rust-style regular string literals and raw string literals is really convenient, although I understand the concern, and Javascript-style string literals with backslash escapes and '
/ "
/ `
delimiters are fine too.
Now that I think about it, what's your opinion on Python-style literals ('
/ "
/ '''
/ """
for regular string literals and r'
/ r"
/ r'''
/ r"""
for raw string literals)? This gives the flexibility of multiple delimiter options (like Javascript) but also provides the raw string literal option (to avoid interpreting backslash escapes). Go-style literals are another option ("
for regular string literals and `
for raw string literals).
… isn't there solution 4? …
If I understand correctly, that's effectively the same as option 2 (custom shortcode parser, with content -> parse/render shortcodes -> parse/render Markdown
). Yes, that does work. You can use a regex for parsing the shortcodes (btw, the current regex is somewhat broken, but that's fixable), or you can use a Pest parser. Note that this approach will render shortcodes in Markdown code spans and code blocks (because shortcodes are processed before parsing Markdown), so you need some way to escape shortcodes. For example, you could add {% raw %} ... {% endraw %}
for this purpose.
The primary reason why I suggested using Tera (option 3) instead of using a custom shortcode parser (option 2) for rendering shortcodes before parsing Markdown is to avoid having to effectively reimplement portions of the Tera parser in Gutenberg with regexes/Pest. However, it's certainly possible to use a custom shortcode parser separate from Tera.
I'm willing to work on the implementation however you'd like it done.
I'm aware of the call
in Jinja2 macros but I find it very hard to understand/explain and since the usecase is niche, probably not worth adding.
Yeah, this is probably somewhat unusual. I came across this when I tried to write a figure shortcode that generates
I would say that points to solving that in Gutenberg and not Tera. We don't need the shortcode calls to be limited to what's possible in Tera, and we only really want bool/numbers/string as values there.
Now that I think about it, what's your opinion on Python-style literals (' / " / ''' / """ for regular string literals and r' / r" / r''' / r""" for raw string literals)? This gives the flexibility of multiple delimiter options (like Javascript) but also provides the raw string literal option (to avoid interpreting backslash escapes). Go-style literals are another option (" for regular string literals and ` for raw string literals).
I think adding all kinds of quotes/ticks as kind of synonyms make sense. Someone is working on string concat (https://github.com/Keats/tera/issues/222) so if you want to avoid escaping you could do {% set description = "hey 'bob'" ~ '"here as well"' %}
. Not super pretty but again I don't think that's something that happens very often in the template itself.
For the parsing, I would definitely go for solution 2 to get the best UX as an editor. To avoid the line-looking-like-shortcode issues, a simple hack could be to check if we are in a code block while iterating and do nothing if it's the case. Code blocks are the only moment where I believe you could have a line looking like a shortcode in a markdown document, unless you're writing weird content. I'm pretty sure we can get around with regex without needing the parser: the current code to parse a shortcode is not working 100% correctly (I blame regex) but does roughly the job already.
For the parsing, I would definitely go for solution 2 to get the best UX as an editor.
Okay, that works for me.
I think adding all kinds of quotes/ticks as kind of synonyms make sense. Someone is working on string concat…
That's a reasonable approach.
To avoid the line-looking-like-shortcode issues, a simple hack could be to check if we are in a code block while iterating and do nothing if it's the case.
Reliably determining whether the shortcode parser (i.e. the iterator over lines) is currently in a code block is surprisingly complex. For example, the parser needs to know the difference between block and inline HTML tags. Consider the following Markdown document:
<div>
~~~
{{ foo() }}
which results in the HTML:
<div>
~~~
<p>{{ foo() }}</p>
In this case, <div>
is a block HTML tag. If you replace <div>
with <span>
:
<span>
~~~
{{ foo() }}
the resulting HTML is:
<p><span></p>
<pre><code>
{{ foo() }}
</code></pre>
Whether or not {{ foo() }}
is in a code block depends on whether the HTML tag is a block tag (e.g. <div>
) or an inline tag (e.g. <span>
). [Edit: It turns out that handling HTML blocks in Markdown is more complex than just distinguishing between block and inline tags. For example, if the HTML block starts inside a block quote, the end of the block quote can terminate the HTML block.] This is just one example; there are other complexities with Markdown too.
Maybe we could fork pulldown-cmark
to provide sufficient information to determine which portions of a document are in a code block? Otherwise, as far as I can tell, allowing the user to escape things with {% raw %}...{% endraw %}
is much simpler to implement than trying to detect when shortcodes are in code blocks.
I expect that text that looks like a shortcode is pretty rare anyway, so the user won't need to use {% raw %}
very often.
Another consideration is that if Gutenberg adds support for markup languages other than Markdown, it's a lot easier to deal with {% raw %}
than to determine when the parser is in a code block for each supported markup language.
I'm pretty sure we can get around with regex without needing the parser: the current code to parse a shortcode is not working 100% correctly (I blame regex) but does roughly the job already.
Yes, I think it's possible to use regexes instead of something like Pest.
Hmm we still need to check if we are in a code block, unless we force users to wrap raw/endraw
each code block that has something that look macros.
I think the similar-to-macro tag/markdown code symbols issue in HTML would be a really edge case and am wondering if it's worth adding raw/endraw
immediately or wait till someone has the issue.
If it ends up being added, I would call them ignore/endignore
to avoid confusion with the Tera raw
tag but it should still do a naive code block check to ensure they are almost never needed.
unless we force users to wrap
raw/endraw
each code block that has something that look macros
Yes, that's what I'm proposing.
… naive code block check …
I assume you mean something like this?
let mut in_code_block = None;
for line in lines {
in_code_block = match in_code_block {
None => {
if the line matches regex "^ {0,3}(?P<fence>`{3,}|~{3,}).*$" {
Some(captures.fence)
}
}
Some(fence) => {
if the line matches regex format!("^ {{0,3}}{}{}* *$", fence, fence[0]) {
None
}
}
};
}
These are the costs/benefits to the two approaches:
Approach | Advantages | Disadvantages |
---|---|---|
naive check for code blocks | * don't have to escape text that looks like a shortcode in a code block (assuming the naive check works) | * inconsistent/incorrect on some edge cases * shortcodes must be on their own line with no leading whitespace and surrounded by empty lines |
require `{% raw %}`/`{% ignore %}` to escape, including in code blocks | * always consistent * easier to extend Gutenberg to other markup languages * possible to allow shortcodes inside Markdown blockquotes, lists, code blocks, paragraphs, etc. | * if text in a code block looks like a shortcode, it needs to be escaped with `{% raw %}`/`{% ignore %}` |
I think the similar-to-macro tag/markdown code symbols issue in HTML would be a really edge case …
That's true. Another (edge) case where a naive check will fail is this:
1. foo
~~~
{{ foo() }}
In this case, the code block started by ~~~
ends before {{ foo() }}
. If the Markdown source is this:
A. foo
~~~
{{ foo() }}
then {{ foo() }}
is in the code block started by ~~~
.
The difference is that 1.
starts a list item, while A.
does not.
… it should still do a naive code block check …
Fair enough. I guess I just weigh the costs/benefits differently. I think as long as you require shortcodes to:
a naive check will be correct unless
~~~
or ```
appears in an HTML block or~~~
or ```
appears without a closing fence in a list itembut there may be some additional edge cases that I haven't considered.
Edit: One example where inline shortcodes would be useful is for mathematics. For example, you could write inline math {{ math(tex="\sum_{i=1}^N \sqrt{x_i^2 + 5}") }} in some text
. (This doesn't work if shortcodes need to be surrounded by blank lines.)
be on their own line, be surrounded by blank lines, and have with no leading whitespace
That's pretty much what's required right now for shortcodes to work.
I guess I prefer keeping the happy path as clean as possible with the risk of being a bit wrong sometimes and force the user to use the raw/ignore
escape hatch. That's still a minor issue in the end.
Thoughts on that @RadicalZephyr @reillysiemens ?
I think I'm more inclined towards the conceptual simplicity of not checking whether a shortcode is inside a code block, since it's another piece of mutable state to track on top of whether we're inside raw/ignore tags. I think it will also make explaining to users when they need to use the raw/ignore
escape hatch more difficult.
i.e. without any checking for context of shortcodes, the rule is:
If it looks like a shortcode, but it's not, then surround it with the
raw/ignore
tags.
Versus with the code check:
If it looks like a shortcode, it's fine as long as it's inside a code block (probably). But if it doesn't work, then surround it with the
raw/ignore
tags.
And as a user, I don't think I would be annoyed by having to escape code that looks like a shortcode. The only time I write code in a blog post that looks like that is when I'm writing about a templating language, or possibly writing a post about using Gutenberg itself, which for the "typical" user is probably fairly rare.
I also suspect that the confusion stemming from when raw/ignore
is required will lead to more support request issues being filed, not sure if that's a concern for you or not.
Also, once we introduce a naive code block check, overtime it will probably become very tempting to make it less naive, and the end result of that is rewriting the markdown parser. This is definitely a bit of an ad absurdum argument, but that's where I see it heading and it sounds like a headache to support it to me. But if you really want the code block check, it's not a dealbreaker for me :grin:
Also, possibly the "rightest" way to solve this is to patch the parser in pulldown-cmark so that we don't lose the information we need/want for parsing shortcodes, or make their parsing extensible so we can modify it to actually emit shortcode events. But that's a whole other yak to shave.
Majority rules! I'll take PRs for that and we can bikeshed on the raw/ignore
name on them
@RadicalZephyr @jturner314
Are you going to collaborate on some branch? I'm guessing this would simplify the markdown rendering tremendously.
Sorry about the delay; Thanksgiving holiday was last week in the US.
The shortcode parsing is fairly independent of the Markdown parsing. (The shortcode parsing/rendering is a &str -> Cow<str>
(or maybe &str -> String
) transformation before parsing/rendering Markdown.) I have a good start on the shortcode parsing. Once I'm done, I can submit a PR to @RadicalZephyr's branch in #174.
In the meantime, @RadicalZephyr can use this stub in short_code.rs
while refactoring the Markdown parsing:
use std::borrow::Cow;
use context::Context;
use errors::Result;
pub fn render_shortcodes<'a>(content: &'a str, context: &Context) -> Result<Cow<'a, str>> {
Ok(Cow::from(content))
}
Then, markdown_to_html
is something like
use context::Context;
use errors::Result;
use short_code::render_shortcodes;
use table_of_contents::Header;
pub fn markdown_to_html(content: &str, context: &Context) -> Result<(String, Vec<Header>)> {
let after_shortcodes = render_shortcodes(content, context)?;
// parse `after_shortcodes` instead of `content`
// ...
}
Does that sound like a good plan?
Sounds perfect to me!
I'll release 0.3.0 this week since there are quite a few bugfixes already accumulated and hoping/working to have that in the major version after that
Hello. I came here from #225 where I ran into problems with inadequate flexibility of shortcodes. I briefly read the discussion above. My conclusion so far is none of the proposals so far help me to achieve what I'd like. From where I'm standing, there are two problems with shortcodes:
I've given the problem a bit of thought and I've come with the following idea. The parsing problem would be solved by dropping shortcodes and replacing them with "Tera blocks" parsed before Markdown. The context problem would be solved by evaluating the Tera blocks later in the pipeline.
This is what a Tera block looks like in my mind:
<?tera ...arbitrary Tera code goes here... ?>
This syntax will probably be weird/ugly to you at first, but there are two reasons for using it:
The parsing and evaluation would go as follows:
<?tera token-0 ?>
for the first Tera block, <?tera token-1 ?>
for the next one, and so on ... or something like this)Because of the late evaluation, Tera blocks would have to be forbidden in page summary. Otherwise it seems to me like this offers the greatest deal of flexibility while not getting in the way of Markdown parsing (to my best knowledge).
I know this is a radical change not likely to be met with much of acceptance, but I think I'm just going to implement this, because I need a static site generator with support for picture galleries soon-ish and other static site generators are generally horrible...
Context - shortcodes have no access to site & page metadata
I believe passing the site config would be easy to add, the page as well probably. It's not done currently because I wanted to have the markdown rendering have as few dependencies on the rest of Gutenberg as possible but that's doable.
The parsing problem would be solved by dropping shortcodes and replacing them with "Tera blocks" parsed before Markdown. The context problem would be solved by evaluating the Tera blocks later in the pipeline.
That doesn't work for re-usability. Let's say I want to have a photography
shortcode, I don't want to have to copy paste it every single time I use, or update all the pages if I want to change a css class for example. Same if I want to ship shortcodes with a themes.
I believe passing the site config would be easy to add, the page as well probably.
Hm. I'm not sure this would work... For example, I think assets are listed later than Markdown parsing.
That doesn't work for re-usability. Let's say I want to have a photography shortcode, I don't want to have to copy paste it every single time I use, or update all the pages if I want to change a css class for example. Same if I want to ship shortcodes with a themes.
My idea was to use Tera macros. Ie. all the macros that are available to templates should also be available to Tera blocks. Which should also enable shipping them with themes. Ie. for example {{ youtube(id="dQw4w9WgXcQ") }}
would become <?tera youtube(id="dQw4w9WgXcQ") ?>
... I should've made this clearer in previous post, sorry...
Hm. I'm not sure this would work... For example, I think assets are listed later than Markdown parsing.
Nope, Markdown rendering is done once all pages are loaded as we need to have all pages/sections permalinks to resolve internal links.
My idea was to use Tera macros. Ie. all the macros that are available to templates should also be available to Tera blocks
Macros have to be imported and namespaced to avoid clashing: a theme might have a youtube
macro but what if I want my own?
What would your photo gallery macro/shortcode call/implementation look like?
Nope, Markdown rendering is done once all pages are loaded as we need to have all pages/sections permalinks to resolve internal links.
Ah, okay.
Macros have to be imported and namespaced to avoid clashing: a theme might have a
youtube
macro but what if I want my own?
I forgot about the import bit :-( (But perhaps that could be solved somehow.) As for clashes, use namespaces? Ie. the theme might use its namespace and so could you...
What would your photo gallery macro/shortcode call/implementation look like?
I uploaded an example of my WIP blogpost in the picture thread. Presumably, in Gutenberg / Tera, the macro/shortcode could be something like this:
{% for thumb in img_scale(imgs=page.assets, w=..., h=...) %}
{# ... render elements in a way similar to the example... #}
{% endfor %}
(I haven't considered the "screen" resize yet I use in my Pelican setup ...)
Ok, regarding the context, I guess this is just a matter of adding page metadata to the context, which should be straightforward. Thanks for the clarification on that.
Regarding macro import, this could be done with auto-importing macros at the start of each tera code, kind of like Rust inserts the std prelude automatically. That is, built-in stuff like youtube
could be auto-imported under an std
namespace. For example, std::youtube(id="...")
et al. Additionally, there could be a macros.html
file under the templates directory with additional theme- or user-defined macros that would be auto-imported as macros
.
... I'm just throwing ideas around, basically this is just me needing a working prototype. I'll try to implement something like this over this weekend hopefully and then given time we'll see if these ideas are worthwhile or not ...
I have to say I don't really see the advantages of macros vs shortcodes. Macros can call other macros (same namespace or another one) so that can become kind of mess to deal with without handling namespaces yourself.
Okay, I suppose that's a good point.
Looking at the shortcode parsing code made me realize it won't work when a string argument contains a comma (,
). I tested this with current master, and sure enough, it fails. This is sort of why I'm not too fond of shortcodes - to fix all these parsing issues properly I think you'd basically have to duplicate lot of the parsing that is done in Tera.
Then again, I think you are right about macros not being very suitable as a replacement. So here's another idea: Perhaps guteberg could load all the shortcodes from the shortcodes directory and make them available as global functions, at least for the 'Tera blocks' context. Each function would simply pass all args passed to it to the template. Edit: I forgot global fns are associated with the whole Tera instance rather than a context. So they'd have to be avialable to all tera code. Not sure if that's ok.
I finally started working on the rewrite, and so far it's looking like it's going to be something like:
shortcodes are allowed inline
ignoring a shortcode is done in the shortcode itself: {{/* hello() */}}
will output {{ hello() }}
while {{ hello () }}
will render a shortcode. Same for the shortcodes with a body. This is the approach taken by Hugo and is kind of minimalist. It might be annoying if you have lots of things looking like a shortcode like a in code block. Having to do [[ raw ]]{{ hello() }}[[ end ]]
(or whatever the syntax would be) could be a bit annoying inline. Any feedback? Another reason to not use a special block name like raw
is that that's something more to consider when parsing which might or might not be part of the content of the document
no more regex issues and support for string/bool/int/float/arrays as arguments
In the end I'm doing a pest parser like @jturner314 did :bowing_man:
It appears as though shortcode file names cannot have an underscore in them and that the value that argument values cannot contain special characters.
To reproduce the filename issue:
./templates/shortcodes/contact_info.html
.To reproduce the argument issue:
./templates/shortcodes/contact.html
.This is exacerbated because it appears the dev server will fall back to a previous state when it cannot build correctly...and then sometimes gets stuck and needs to be manually restarted.