contentlayerdev / contentlayer

Contentlayer turns your content into data - making it super easy to import MD(X) and CMS content in your app
https://www.contentlayer.dev
MIT License
3.27k stars 201 forks source link

Provide a way to populate field values from Markdown/MDX processing pipeline #216

Open motleydev opened 2 years ago

motleydev commented 2 years ago

Feature: I'd like to use an MDX plugin to create programmatic meta (variable declarations) that can then be exposed in the final object contentlayer provides.

Use case: An example of here this would be useful is generating a TOC, creating a list of unique keywords, etc.

Work around: Use computed fields (which don't expose the the raw AST)

Related issues: https://github.com/kentcdodds/mdx-bundler/issues/169

Consider the following config structure

export default makeSource({
  contentDirPath: "content",
  documentTypes: [Posts, Things, People],
  mdx: {
    rehypePlugins: [rehypeSlug, rehypeAutolinkHeadings, searchMeta],
  },
});

Where searchMeta looks at paragraph nodes of mhast, grabs a list of unique words, and adds them to the metadata as searchMeta.

A markdown file with the structure of:


---
title: Hello World
slug: hello-world
---
Hello World! Please say Hello!

Would generate a final object of:

{
"title": "Hello World",
"slug": "hello-world",
"searchMeta": ["hello", "world", "please", "say"],
"code": "().....",
"_raw": "..."
}

For sake of complete, if not ugly code, here's a working example of the plugin that adds searchMeta to the data attribute of the vFile in the rehype plugin chain.


import { visit } from "unist-util-visit";

export default function searchMeta() {
  return (tree, file) => {
    visit(tree, { tagName: "p" }, (node) => {
      let words = node.children.reduce((collector, current) => {
        if (typeof current.value === "string") {
          let wordList = current.value
            .split(" ")
            .filter((word) => !word.includes(":"))
            .map((word) => word.toLowerCase().replace(/[^a-z0-9]/gi, ""))
            .filter((word) => word.length > 3);
          let newCollector = new Set([...wordList, ...collector]);
          return newCollector;
        } else {
          return collector;
        }
      }, new Set());

      file.data.searchMeta = [...words];
    });
  };
}
timlrx commented 2 years ago

As a temporary workaround, one could consider defining a computedField that parses the raw output from contentlayer. Here's an example of extracting the table of contents of a markdown file and making it available as a toc property in contentlayer

// Assume a remark plugin that stores the information in `vfile.data.toc`
export async function extractTocHeadings(markdown) {
  const vfile = await remark().use(remarkTocHeadings).process(markdown)
  return vfile.data.toc
} 

const computedFields: ComputedFields = {
  toc: { type: 'string', resolve: (doc) => extractTocHeadings(doc.body.raw) },
  ...
}
schickling commented 2 years ago

@motleydev would having access to the vFile.data property from within computedFields be a good solution to your described problem?

Something along those lines

const computedFields: ComputedFields = {
  toc: { type: 'string', resolve: (_doc, { vfile }) => vfile.data.toc },
  ...
}
motleydev commented 2 years ago

When do computed fields get executed? At run time or at compilation? At the end of the day, what I'm trying to get is the data added to static output.

schickling commented 2 years ago

computedFields are executed together with all other fields - therefore are part of your static output. (Just opened a docs issue to clarify this).

motleydev commented 2 years ago

in that case, that would probably work just fine! Would still be nice to do the work during the original transform process to not need to revisit each file, but for a static output process, that's probably shaving the yack a bit too close.

motleydev commented 2 years ago

The more I think about it, accessing vfile.data from computed fields would totally solve my use-case. It'd still be nice to be able to do all the work "in" the handler, but being able to do visit work during the initial parsing and then passing that along with the payload would be more than sufficient. What do you think a reasonable timeline on that would be?

essential-randomness commented 1 year ago

Any update on this? I'd be willing to try my hand at a PR to pass vfile as an additional argument of resolve in computedFields. I need the same thing!

essential-randomness commented 1 year ago

I've spent the evening trying to work on a solution myself (for MDX files), and reached the same conclusions as @stefanprobst in https://github.com/contentlayerdev/contentlayer/pull/236#issuecomment-1167848789. To summarize: there is no way to access vfile.data when using mdx-bundler or @mdx-js/esbuild, and the best way to surface them back to them is as named exports, as done here.

At this point, I think the way to resolve this would be

  1. Create utilities for (or document a way) to map vfile.data fields to named exports.
  2. Support surfacing MDX exports as document fields (https://github.com/contentlayerdev/contentlayer/issues/64).

I'm still willing to try and help further progress on this issue. Currently carrying around a lot of hacks in my code ;)

schickling commented 1 year ago

Thanks for your comment @essential-randomness. Very helpful. I hope I'll get some capacity soon to take a stab at this!

donaldxdonald commented 1 year ago

Need this~

cpatti97100 commented 7 months ago

I tried the code above to no avail... did someone manage to read the mdx content and add data to frontmatter using a custom remark plugin in this context? thanks!

cpatti97100 commented 7 months ago

hope it helps someone, in the end I managed like this

// this is a bit too custom maybe but you get the idea
function extractHtmlHeadings(tree) {
  const headings = []

  visit(
    tree,
    (node) =>
      ['mdxJsxFlowElement', 'mdxJsxTextElement'].includes(node.type) &&
      node.name.match(/h[2-3]/g),
    (node) => {
      if (['mdxJsxFlowElement'].includes(node.type)) {
        headings.push({
          id: node.attributes[0].value,
          text: node.children[0].children[0].value,
          type: node.name === 'h2' ? 'heading2' : 'heading3',
        })

        return
      }

      headings.push({
        id: node.attributes[0].value,
        text: node.children[0].value,
        type: node.name === 'h2' ? 'heading2' : 'heading3',
      })
    }
  )

  return buildTreeFromHeadings(headings)
}

export const InstructionsForUse = defineDocumentType(() => ({
  contentType: 'mdx',
  computedFields: {
    toc: {
      type: 'nested',
      of: Toc,
      resolve(doc) {
        return remark()
          .use(remarkMdx)
          .use(function searchMeta() {
            return function transformer(tree, file) {
              const headings = extractHtmlHeadings(tree)

              file.data = headings
            }
          })
          .process(doc.body.raw)
          .then((vFile) => {
            return vFile.data
          })
      },
    },
  },
felicio commented 1 month ago

Relates to https://github.com/contentlayerdev/contentlayer/issues/566