Add Code Block Component

ItsMeBrianD commented 1 year ago

This addresses #408; with a starting point for code blocks.

To address the ideas presented in the issue:

HTML Raw Syntax is a viable alternative, but would be more verbose, and less in-line with other utilities provided.
Altering the markdown triple-tick is inhibited by mdsvex, which seems to do it's own parsing of the markdown file and prevents recategorization of blocks that are currently passed to the query view.
Syntax & Row highlighting is currently handled by Prism.js, wrapped into the CodeBlock component.

This component was taken from svelte-prismjs, which is also licensed under an MIT license. The typescript has been converted to javascript with type annotations; along with loading language support from a CDN dynamically (not all languages are loaded by default, and loading every supported language adds a large load to every page).

changeset-bot[bot] commented 1 year ago

🦋 Changeset detected

Latest commit: 2f4f259fa1ce8b1731a5ea663605ec443dc6f653

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages

| Name | Type | | ------------------------- | ----- | | @evidence-dev/preprocess | Minor | | @evidence-dev/components | Patch | | @evidence-dev/evidence | Patch | | evidence-test-environment | Patch |

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
evidence-docs	✅ Ready (Inspect)	Visit Preview	Nov 16, 2022 at 10:30PM (UTC)

archiewood commented 1 year ago

Thanks for the PR!

This looks great, definitely something we need to support. Really like the way it implements arbitrary language support too.

Two questions on implementation:

Is it possible to support a <slot/> syntax, instead of, or in addition to passing a js string?
- This would make it feel a bit more "markdown + components" and less "Javascript-y" which is the experience we're trying to give users, most of whom are novice web developers.
- I'm imagining something like:
```
<CodeBlock language=python highlight=true>
def hello_world():
    print("Hello World!")
</CodeBlock>
```
Could we use a darker theme?
- Not a huge fan of the default Prism theme. Perhaps Okaida or Tomorrow Night?

ItsMeBrianD commented 1 year ago

Is it possible to support a syntax, instead of, or in addition to passing a js string?

This was my initial thought, and it may be possible; the hurdle here is when using brackets in a function it is attempting to parse it as javascript. e.g.

<CodeBlock>
function helloWorld() {
    console.log("Hello World");
}
</CodeBlock>

gets rendered as

function helloWorld() undefined

A deeper dive into the processing steps may yield some options, but mdsvex doesn't give much room to adjust the way it is parsing.

Could we use a darker theme?

Absolutely; this is a pretty simple change, wasn't sure what the preference was here. I'll go ahead and add this

archiewood commented 1 year ago

Okay, makes sense. If you are putting anything in {} brackets then I assume svelte is trying to parse it as inline js. Does this help at all.

EDIT: wrong link, sorry

ItsMeBrianD commented 1 year ago

I'll do a deeper dive on this tonight; my thoughts so far:

It may be possible to escape {} automagically in the parsing process; but this would create a tie to the <CodeBlock token that would be used as a start/stop for this escaping, also depends on mdsvex being opinionated on their pipeline.

Another alternative is to use an escape or template string as practice, but it's going to create another roadbump for users, and it would be much harder to find a solution for potentially.

There may be some svelte magic to be done here but there isn't a great way to access slots before they are parsed (which makes sense, messing with the framework internals as part of the code is a footgun).

archiewood commented 1 year ago

It may be possible to escape {} automagically in the parsing process; but this would create a tie to the <CodeBlock token that would be used as a start/stop for this escaping, also depends on mdsvex being opinionated on their pipeline.

Agreed. These may help?

What's the issue with the tie to the <CodeBlock/> token? That if we change the syntax later (e.g. to <Code/>) then it would break?

ItsMeBrianD commented 1 year ago

What's the issue with the tie to the <CodeBlock/> token? That if we change the syntax later (e.g. to <Code/>) then it would break?

Correct; here's where my exploration of the 4 tick marks led me:

/**
 * @type {import("unified").Plugin}
 * @this {import("unified").Processor}
 */
const extractCodeBlocks = function() {
    const Parser = this.Parser
    const block_tokenizers = Parser.prototype.blockTokenizers;
    const block_methods = Parser.prototype.blockMethods;
    // <<<< Notice the regular expressions here >>>>
    const codeBlockStart = /^\s*````/
    const codeBlock = /(`{4}(?:.|\s)*?`{4})/
    /**
     * This is responsible for escaping our code blocks properly; preventing other pre-processing from applying
     * @param eat {import("remark-parser").remarkParse.Eat}
     * @param value {string}
     * @param silent {boolean}
     */
    block_tokenizers.evidenceCodeBlock = function(eat, value, silent) {
        const start = codeBlockStart.exec(value);
        if (start) {
            const fullBlock = codeBlock.exec(value)

            if (fullBlock) {
                // start of value to end of regex match (i.e. any newlines that appear before the start of the block)
                const valueToEnd = value.substr(0,fullBlock.index + fullBlock[1].length)
                eat(valueToEnd)({
                    type: "evidenceCodeBlock",
                    value: `${fullBlock[1]}`
                })
            }
        }
        return false;
    }

    // Prepend our new block type to ensure that it runs first (otherwise it is skipped by other applicable blocked)
    Parser.prototype.blockMethods = ["evidenceCodeBlock", ...block_methods]
}

This correctly identifies and distinguishes code blocks with 4 ticks instead of 3, to split them out from queries. This plugin would need to be inserted into mdsvex as well in order for it to function, which there doesn't seem to be an easy way of doing.

To mangle / escape the code inside a codeblock, it would likely need to take place somewhere in this pipeline; but that runs us into mdsvex again.

Using the highlight function would be very helpful; but it would require modifying this to indicate that the content of the <CodeBlock/> tags should be interpreted as code, and then correctly escaping them where the QueryViewer insertion is happening.

Another potential pitfall of attempting to do escapes in this way is that you need to do tag recognition; which I haven't played with yet, but could be another issue, as it is more complex to find a valid opening tag compared to a wall of ticks.

As an alternate approach;

mdsvex already escapes brackets that appear inside valid code: https://www.npmjs.com/package/mdsvex/v/0.3.0#escaped-curlywurlies

If restrictions are placed on valid query ids (i.e. you can't name your query a language), then we may be able to just hinge the highlight function that exists now (and ensure that we are properly preventing queryIds from being generated elsewhere).

This has the upside of being as close to a native experience as possible; with the downside of (minor) restrictions placed on other parts of the app. Also may be a less invasive change than directly modifying the parsing process like this.

archiewood commented 1 year ago

If restrictions are placed on valid query ids (i.e. you can't name your query a language), then we may be able to just hinge the highlight function that exists now (and ensure that we are properly preventing queryIds from being generated elsewhere).

This is a nice idea. I think that's all handled in preprocess, specifically here

I don't hate the idea of disallowing a small set of query names e.g. python, r, js, html, css, json, sql. Advantages

It's markdown-y which is nice, and very similar to how other markdown tools do codeblocks, so rendering evidence .md pages in other tools would give good results
there is almost no boilerplate

Concerns

Would this be confusing for users?
- I think we'd want to give people the option to do sql code blocks, and so especially could imagine people writing
```
```sql
  select * from my_table
```
```
  and expecting it to run a query, but getting a code block
```
We would want to keep this list relatively short. What if someone wants to use a language outside this list? Do we need a catch all "other language" label also

Alternative

I guess an alternative to similar approach to the above would be a syntax with a single reserved keyword. Eg 'code' or 'codeblock'

```codeblock-python
print("Hello world!")

or
````markdown
```codeblock(python)
print("Hello world!")


**Advantages**
- Less likelihood of name collisions (only one keyword)
- Arbitrary language support possible

**Concerns:** 
- this is not how other markdown languages do it, so if you rendered the .md file you wouldn't get the right highlighting (though you'd probably still get the code).

ItsMeBrianD commented 1 year ago

Actually implementing this is actually pretty straightforward by working off of the existing highlight function; and when a language is detected, rendering the CodeBlock component, rather than rendering the QueryViewer. Wrapped with codeblock or not shouldn't have a huge impact, especially because we can just sanitize it away before passing it through.

For the SQL Block confusion; I could see it going either way, but if the best practice of "hey name your query" were opinionated, that would make a good case for both having sql behave the same as others, while also pushing users towards (probably, although I'll concede this may be wishful thinking) better organization.

We would want to keep this list relatively short. What if someone wants to use a language outside this list? Do we need a catch all "other language" label also

This behaves the same as the original component, and as a result only imports languages that are needed at runtime, so supporting all the available languages shouldn't cause too large of a concern. It does somewhat constrain the available names for queries, which could present an edge case.

Arbitrary language support possible

I think this is possible regardless of which direction we go; because we have the highlight function to intercept any custom logic that should be applied.

I would lean towards staying within the markdown spec, because diverting from it (as mentioned) could lead to other editors not knowing 100% what to do with it. It will also be more familiar for users that have written markdown before and keep a natural writing experience as opposed to having to learn to prefix things.

archiewood commented 1 year ago

Which syntax

I would lean towards staying within the markdown spec, because diverting from it (as mentioned) could lead to other editors not knowing 100% what to do with it. It will also be more familiar for users that have written markdown before and keep a natural writing experience as opposed to having to learn to prefix things.

That's good enough logic for me. I'm in agreement.

Which languages

It does somewhat constrain the available names for queries

This is more my concern. PrismJS supports 299 languages, and whilst most of them seem unlikely to conflict with user query names, some of the more obscure languages might: properties, ignore, processing, reason etc.

A name collision would be particularly confusing here, because it wouldn't cause an error, just do something unexpected.

I'd therefore probably start with a (relatively) short list of popular, common languages, with a bias towards those more used by analysts, and then if developers need others we can very easily add them.

I think it would also be sensible to have a name for any-other-languages-which-we-dont-support-highlighting-but-can-render-as-a-code-block. Perhaps codeblock, code, unhiglighted, other-language? Suggestions welcome as I'm not sure any of these are good.

@0c370t Make sense?

ItsMeBrianD commented 1 year ago

Yeah this tracks; makes sense to avoid unexpected behaviors (honestly didn't realize that those were languages / syntaxes).

I've got a working prototype of this, the biggest remaining question is handling the un-syntaxed code. Personally I would just use the 3 ticks, but I'm not sure how that interacts with the expectations set by Evidence. Just having code seems like a reasonable alternative; that again should be pretty straightforward to implement.

With regards to the supported languages; it might be worthwhile to just grab off the Stackoverflow 2022 language study for everything above say, 5% (while also adding in languages more likely to appear, such as R, or MATLAB), as I'm not sure where to come up with a good list of popular languages in the space. If you've got thoughts on sourcing a list I'd love comments on that.

archiewood commented 1 year ago

honestly didn't realize that those were languages

Neither did I until today

With regards to the supported languages; it might be worthwhile to just grab off the Stackoverflow 2022 language study for everything above say, 5%

Yep, great shout as an initial list.

I would add the following languages / filetypes

R
Matlab
DAX
json
yaml

Personally I would just use the 3 ticks, but I'm not sure how that interacts with the expectations set by Evidence.

I think it's helpful to have 3 ticks throw an error for users about needing to name their sql query. (Which is what happens today). As a heavy user of core Evidence I find it's pretty easy to forget to name your queries (which is vital).

So let's go with code

EDIT: Added DAX, which people using MS toolchain / PowerBI use a lot

ItsMeBrianD commented 1 year ago

Alright; latest push should sync up with what we had discussed;

Code blocks are now functioning the same as in markdown proper; with a subset of languages supported.

Adding additional languages should be as easy as putting them in that array; in the future it might be possible to expose an api to manually enable support for other languages by mutating that array.

Let me know if there are any other changes you see @archiewood

archiewood commented 1 year ago

I'm a bit unsure that names in that array like HTML/CSS and Bash/Shell will work as expected. (Expect prism will need things in format matching https://prismjs.com/download.html). Also not sure if cases are an issue for prism.

Can you add a snippet of each of the supported code types to that markdown file? (Don't care really the snippet does, maybe this will help for unfamiliar languages: helloworldcollection.de)

ItsMeBrianD commented 1 year ago

I'm a bit unsure that names in that array like HTML/CSS and Bash/Shell will work as expected. (Expect prism will need things in format matching https://prismjs.com/download.html).

Yeah that's a side effect of writing code first thing in the morning; I've corrected this

Also not sure if cases are an issue for prism.

Probably not for prism, but we do need them to be consistent for our own matching. I've added changes for this as well

Can you add a snippet of each of the supported code types to that markdown file?

Done. The DAX example could use some sanity checking though

archiewood commented 1 year ago

I've added a feature I'm always reaching for in code blocks: Line highlighting

archiewood commented 1 year ago

Just been doing some more thorough testing, and have two more things I'd like to address before merging

Styling not isolated

i. The CodeBlock styling bleeds into QueryViewer component

ii. QueryViewer styling bleeds into CodeBlock (note extra space after code finishes)

Vite warning

I'm getting a warning from vite in the terminal


/Users/archie/Projects/forks-evidence/0c370t/sites/example-project/src/components/viz/CodeBlock.svelte
279|                    await Promise.all(Array.from(requiredLanguages).map(async rl => // Dynamically load all required languages from the cdn
280|                    // This cannot be loaded from node_modules because it is interpolated.
281|                    import(`https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-${rl}.min.js`)));
   |            ^
282|            }
283|    }
The above dynamic import cannot be analyzed by vite.
See https://github.com/rollup/plugins/tree/master/packages/dynamic-import-vars#limitations for supported dynamic import formats. If this is intended to be left as-is, you can use the /* @vite-ignore */ comment inside the import() call to suppress this warning.

Is this is problem? And if not can we suppress it?

ItsMeBrianD commented 1 year ago

I'll have to dig in on the styling; I know that for the prism stuff that might be more involved because they are imports from .css files; instead of being handled by the svelte style tab.

The vite warning is pretty straightforward if I recall properly; should just be an @vite-ignore.

archiewood commented 1 year ago

Fair. I started reading this which I thought might help but didn't get very far: https://prismjs.com/plugins/custom-class/

ItsMeBrianD commented 1 year ago

The vite warning is pretty straightforward if I recall properly; should just be an @vite-ignore.

I was wrong; this does not seem to be that easy. I thought I had seen it in another context with that note, but this seems to be a deeper issue. (Vite is using rollup under the hood, so no easy way to touch those errors, from what I can see. Svelte's onwarn hook doesn't even catch it). There is the possibility of pulling in a more up to date plugin (the one throwing the warning is 2 years since last publish), but I was hesitant to bring in any new deps.

I'll have to dig in on the styling; I know that for the prism stuff that might be more involved because they are imports from .css files; instead of being handled by the svelte style tab.

This has so far also proven to be a pain. Using svelte-preprocess we can use some prism-theme=dark custom attribute, and successfully load that and inject it into the style block; but it gets purged because the selectors don't match any children that Svelte controls. Prism custom class tweaks also didn't work, but that's because prism is operating off of window.Prism, instead of a more well scoped variable.

This problem is also fairly solvable - but it requires bringing in a CSS preprocessor, which merits its own discussion as there are many to choose from. (I'm partial to PostCSS, mostly because I am a big fan of tailwind, SCSS / LESS would also solve this problem).

Once that's done it would be a matter of just adding an import that is controlled by the preprocessor, which can be combined with a :global to both scope and prevent purging of the styles in question.

archiewood commented 1 year ago

I was wrong; this does not seem to be that easy.

I'm less worried about this. Whilst it's slight junk in the terminal, it would only be when users were using codeblocks.

This problem is also fairly solvable - but it requires bringing in a CSS preprocessor, which merits its own discussion as there are many to choose from.

As you say this is a bit of a broader issue. We're looking to bring in some kind of CSS framework, but haven't decided which we'll go for. Tailwind is definitely a contender.

I think this is a blocking issue for this PR as it stands. The UX compromise on our core functionality seems like too much of a regression.

I had a play around with another approach, which involves scoping the css manually as per https://github.com/PrismJS/prism/issues/807#issuecomment-148472542 (We could include a local copy of prism-okaidia.css).

This does seem to work to some extent. However, I haven't managed to get it to work perfectly. I think the js scripts are still competing somehow and those are much less easy to edit.

mcrascal commented 1 year ago

Putting this on hold for the moment, pending a decision re: css preprocessor or alternatives here.

evidence-dev / evidence