Implement our own MDX parser

NathanLovato commented 1 month ago

This task is about replacing the MDX parser we use on GDSchool currently, remark, and the plugins we maintain for it, with our own MDX parser.

The MDX parser should take an MDX document, ideally valid, extract the content like imports and YAML frontmatter, and output a TSX file with the metadata as properties and a default export with a React component in React HTML format.

export const title = "..."
export const index = 2
export const previous_lesson = {
  title: "Module Overview",
  slug: "module_overview"
} 
export const next_lesson = {...}
export const module_title = "Top Down Movement"

export const Content = () => {
  return <>
    <h1 className="main-title">Character Controller</h1>
    ... 
  </>
}

Stretch goal: output in JavaScript instead using the React.createElement API to skip extra parsing steps in the build process.

export const Content = () => {
  return React.createElement('', {}, [
    React.createElement('h1', {className:'main-title'}, ["Character Controller"])
  ])  
}

MDX processing needs

We maintain our own remark plugins to make some MDX components easier to write in the source documents. They apply the following transformations:

Sequences of Practice, Callout, and Searchable Components are wrapped into a container. More types of components may use this mechanism in the future.
Child components of Practice, YourTurn, and Challenge components are turned into properties of the parent component. For example, all the hint elements are turned into an array of hints in the parent component.

We need to replicate this behavior in our MDX parser.

Markdown code block parsing needs

We need to turn Markdown code fences into a specific HTML structure. We need to parse and highlight the GDScript code. Options include using a PEG grammar with nim's npeg library, writing our own specialized GDScript parser for highlighting, or passing the code to an external program like prism.js and injecting the result back. The existing build system uses prism.js within nextjs's build system.

Code fences should be turned into this pre and code structure:

<pre className="gdquest-code-container"><code className="gdquest-code">
// code here
</code></pre>

If the code block has the diff attribute (if the language is diff-gdscript for example), we need to insert a class for every line that has a plus or a minus sign at the start.

Markdown headings parsing

We need to extract the H1 heading to use it as a title fallback if a title is not specified in the YAML front matter of the document. We may also need to read the H2 headings to create a table of contents.

Front matter parsing

We use the YAML format for the front matter. We just need to parse it using a YAML parser and inject optional fields or metadata if they are missing. The main two pieces of metadata are title and unlocked, which should be false by default if not specified.

Development

To approach this project, I would:

Look into reusing an existing Markdown parser for Nim, such as nim-markdown, as it is implemented in Nim and produces a token tree that we can traverse to generate the output we need. We have to see if it's usable as-is or if we need to fork it to support MDX-specific syntax like imports and exports.
Collecting pairs of input MDX files and output TSX files to guide development and test the parser against, to ensure it produces the expected output.

NathanLovato commented 2 weeks ago

Shortlist of markdown features to support: headings, links, images, lists (unordered and ordered), blockquotes, basic formatting (bold, italics)

NathanLovato commented 17 hours ago

Simplifications we can make compared to MDX, for our needs (to review):

We may not need MDX imports: we could use and check paths directly where imports would be used instead
We may not need to support parsing JSX expressions, or at least not in detail

GDQuest / product-packager