MesterMan03 / Bedlessbot

This is an epic bot (by an epic person) that handles role applications and level system for Bedless Nation.
https://discord.gg/bedless-nation-691898152277508146
Apache License 2.0
2 stars 1 forks source link

Replace marked with in-house Markdown parser. #11

Closed jonahlikescookies closed 1 day ago

jonahlikescookies commented 1 week ago

I have figured out a way to disable certain markdown features in marked(see https://github.com/jonahlikescookies/Bedlessbot/tree/marked/test, line 55 of src/dashboard/public/scripts/packs.ts), and @MesterMan03 mentioned that there are certain features in markedjs that are to be disabled. By using a modified renderer, you can disable certain features, e.g., masked links. Here is the code:

import { marked } from "marked";
// ...
const renderer = new marked.Renderer();
renderer.link = (href: string) => `<a href="${href}">${href}</a>`; // disable masked links
marked.use({ renderer });
// ...
MesterMan03 commented 1 week ago

Nice, I'll definitely be using that.

With this information, I think it's safe to allow Markdown formatting even for comments. We can use two different renderers for pack descriptions and comments to enable different sets of features.

Comments should probably only have very basic formatting enabled — bold, italic, strikethrough, and maybe lists. We can discard everything else, like tables, links, etc.

Although all comments go through verification, so if I'm being completely honest, this might not even be crucial. If someone tries to be malicious, staff will just block the comment before it becomes public.

noClaps commented 1 week ago

Comments should probably only have very basic formatting enabled — bold, italic, strikethrough, and maybe lists. We can discard everything else, like tables, links, etc.

In that case we could probably write our own Markdown parser. Lists are the only ones that are somewhat complicated to do, everything else is a pretty simple search and replace. We could also limit the formatting to **text** for bold, _text_ for italics and ~strikethrough~ to make it even simpler. If you still want to have full Markdown for descriptions then that can be confined to the server, and since its coming from a trusted source there's no need to sanitise it.

This removes the need for dompurify since we can just Bun.escapeHTML and that should(?) render it as HTML without actually running the code. Any formatting replacements can be done after escaping. Will need to test if escapeHTML does actually work for this.

jonahlikescookies commented 1 week ago

When I try using Bun.escapeHTML it causes a 404 error for some reason. I'm not sure why. Here is the code:

import { escapeHTML } from "bun";
// ...
const markdownOutput = escapeHTML(marked.parse(markdownInput, { async: false }) as string);
// ...
jonahlikescookies commented 1 week ago

From the DOMPurify README,

The resulting HTML can be written into a DOM element using innerHTML or the DOM using document.write(). That is fully up to you. Note that by default, we permit HTML, SVG and MathML. If you only need HTML, which might be a very common use-case, you can easily set that up as well: const clean = DOMPurify.sanitize(dirty, { USE_PROFILES: { html: true } });

Would this improve performance at all?

noClaps commented 1 week ago

404 error

You get a HTTP error from a locally running function..?

jonahlikescookies commented 1 week ago

image It seems that the script doesn't build when I use escapeHTML. I tried setting target to "browser" for the build, and it still didn't work.

MesterMan03 commented 1 week ago

The problem is likely that Bun doesn't support bundling its own APIs yet, and it throws an error while trying to build packs.ts, which means packs.js doesn't get created - thus generating a 404.

From the Bun docs:

In the future, this will automatically polyfill the Bun global and other built-in bun:* modules, though this is not yet implemented.

This might not even be a problem though, since as I mentioned all comments go through human verification, which means there is probably no need to sanitise user content anyway, if humans have already done that.

MesterMan03 commented 1 week ago

Comments should probably only have very basic formatting enabled — bold, italic, strikethrough, and maybe lists. We can discard everything else, like tables, links, etc.

In that case we could probably write our own Markdown parser. Lists are the only ones that are somewhat complicated to do, everything else is a pretty simple search and replace. We could also limit the formatting to **text** for bold, _text_ for italics and ~strikethrough~ to make it even simpler. If you still want to have full Markdown for descriptions then that can be confined to the server, and since its coming from a trusted source there's no need to sanitise it.

This removes the need for dompurify since we can just Bun.escapeHTML and that should(?) render it as HTML without actually running the code. Any formatting replacements can be done after escaping. Will need to test if escapeHTML does actually work for this.

Honestly yeah that might be the best idea. I was thinking of using basic regex patterns, which won't be the problem because the max comment length is only 1024 characters.

Actually, an alternative route is that we keep marked but on the server side, so instead of increasing bundle size and running the markdown parsing on the client, we do it once and just serve the final html to the clients. (Honestly, I much prefer this)

noClaps commented 1 week ago

If we're already using marked for other things, like post descriptions, then yeah we can just use it for comments too.

MesterMan03 commented 1 day ago

Now that marked is loaded via CDN and it doesn't really have any sort of performance impact (the minified script is only 11 kb), I honestly don't think it matters, let's just use the easy solution and not reinvent the wheel.

As for comments, we can probably use Markdown even without DOMPurify since technically the content is considered safe - it is checked by staff.

That's my final conclusion - use marked for both descriptions and comments (I think that feature needs to be added yet).