azu / gitbook-plugin-include-codeblock

GitBook plugin for including file
Other
40 stars 25 forks source link

Cannot input non-word character as the language #50

Closed lwchkg closed 7 years ago

lwchkg commented 7 years ago

If I add the code [include, lang:"javascript+theme:gitbook+lineNumbers:false"](snippet.js), include-codeblock cannot parse the value of lang because it is not in ([-\\w\\s]*). I know this is not usual, but the lang attribute is the only logical way to attach instruction to my highlighter.

I recommend matching against "((?:[^\"\\\\]|\\\\.)*)" instead. After parsing you'll unescape the string.

Testing of the regex is done here: http://rextester.com/EAHN61753

azu commented 7 years ago

the lang attribute is the only logical way to attach instruction to my highlighter.

Maybe, Does custom template resolve it?

lwchkg commented 7 years ago

Doesn't seem to help. I need to have the same syntax work with include-codeblock and also without include-codeblock. Cosidering the snippet below:

```[language]
/* code here */
``` (remove brackets)

There is only 2 logical way to include formatting instructions:

Note: setting template variables doesn't work because code highlighting is done after the template pass and also the markdown/asciidoc pass.

So I opted in the first solution, i.e. the one listed in the description of the issue.

BTW, do you mind if I make a PR of parseVariablesFromLabel? (Didn't code it though.)

lwchkg commented 7 years ago

BTW, if yes (for PR), do you prefer in which ways the string gets unescaped? C-style, or HTML entities?

azu commented 7 years ago

Thanks for clarification. At first, gitbook-plugin-include-codeblock just expand link syntax([include](code.js)) to code block. We don't introduce complex DSL as far as possible.

if my understanding is correct, this PR will just change let valEx = "([-\w\s]*)"; to "((?:[^\"\\\\]|\\\\.)*)". If the change includes complex/usual, unfortunately, we can't accept it.

do you prefer in which ways the string gets unescaped? C-style, or HTML entities?

I don't understand it. Actual code help me to understand.

Please PR.

lwchkg commented 7 years ago

Okay... still need more clarification before actual coding:

  1. Where does the syntax [include](code.js) come from? If it is just inspired by markdown, then you can define it by yourself. But if there is a more specific reference, then I need to look at it and make something consistent.

  2. We do need a proper format (still undecided now) to encode arbitrary strings in your [include...] instruction. Anyway, most important implication lies on the "file" and "title" field instead of the "lang" field though.

  3. There are a few standards for string literals. The most common types are the following:

C-style: e.g. "First line\nSecond line" becomes two lines. Here is a reference: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String#Long_literal_strings

HTML-style: e.g. "a&lt;b" becomes "a<b". Reference: https://dev.w3.org/html5/html-author/charref

so before coding we need to decide what type of string literal to use.

The code for actual unescape can most likely handed off to some external libraries (if not, I'll make one).

lwchkg commented 7 years ago

It is another issue how these values are relayed to the output. If we're careless things will be broken. (In the first attempt, we can just strip offending characters in markdown/asciidoc.)

azu commented 7 years ago

Where does the syntax include come from?

It is markdown. If this plugin has been disabled, the document should be rendered as markdown. The link syntax work as fallback.

It depended on GitBook markdown render. But, it should compatible with CommonMark or GitHub Flavored Markdown Spec.

There are a few standards for string literals. The most common types are the following:

It should be valid in Markdown's link label context.

lwchkg commented 7 years ago

I see. So I'll follow Section 6.1 of GFM/CommonMark (https://github.github.com/gfm/#backslash-escapes) for escapes in input. I'm not sure whether the Markdown of GitBook follows this, but we're free to make a logical specification here because there's no such specification in GitBook.

For output, I didn't determine how (if needed) to escape the strings yet. But I have the understanding that existing named templates (default, full, ace, acefull) should work without modifications. (If full implementation turns out to be too complex, I may cater only for some of the cases.)

Hopefully I can send a PR on Monday.

azu commented 7 years ago

Closed by #52