datopian / markdowndb

Turn markdown files into structured, queryable data with JS. Build markdown-powered docs, blogs, and sites quickly and reliably.
https://markdowndb.com
MIT License
261 stars 16 forks source link

[epic] MarkdownDB plugin system #2

Open rufuspollock opened 1 year ago

rufuspollock commented 1 year ago

We want a plugin system in MarkdownDB so people can easily extend the core functionality, for example to extract additional metadata, so that not all functionality has to be in core and people can rapidly add functionality

Sketch (April 2023)

https://link.excalidraw.com/l/9u8crB2ZmUo/9hkrQmVl9QX

image

Acceptance

Notes

MarkdownDB vs Contentlayer

Contentlayer supported:

What we need:

rufuspollock commented 11 months ago

Doing a bunch of research on remark and micromark re the parsing part of this - could remark be our plug in system here? (probably)

Can you pass "data" along the chain of a plugin

This example https://github.com/remarkjs/remark/issues/251 talks about word counts but it console logs the info ...

var unified = require('unified');
var parse = require('remark-parse');
var stringify = require('remark-stringify');
var english = require('retext-english');
var remark2retext = require('remark-retext');
var visit = require('unist-util-visit');

unified()
  .use(parse)
  .use(remark2retext, unified().use(english).use(count))
  .use(stringify)
  .processSync('*This* and _that_. \n> And some more stuff.\n\nAnd another thing.');

function count() {
  return counter;
  function counter(tree) {
    var counts = {};
    visit(tree, visitor);
    console.log(counts);
    function visitor(node) {
      counts[node.type] = (counts[node.type] || 0) + 1;
    }
  }
}
{ RootNode: 1,
  ParagraphNode: 3,
  SentenceNode: 3,
  WordNode: 10,
  TextNode: 10,
  WhiteSpaceNode: 10,
  PunctuationNode: 3 }
mohamedsalem401 commented 11 months ago

The immediate question that arises is how the output of running plugins can be stored. Let's consider a straightforward example using a simple plugin available at https://github.com/florianeckerstorfer/remark-a11y-emoji. This plugin wraps emojis in a <span> tag and sets the emoji name as the aria-label.

Assuming we successfully run the markdown files through such plugins, the next query is where the newly generated markdown should be stored. Currently, the library only generates SQL databases from metadata, lacking a method to load the content of a file.

Possible solutions include:

  1. Add Content to Database/JSON: Store each file's body content in the generated database or local JSON files. This approach consolidates the parsed content along with metadata.

  2. Generate Separate Markdown Files: Create a designated folder, say .markdown, and start generating markdown files there after parsing. This process involves removing metadata from the files.

  3. Introduce a Loading Method: Implement a method like loadFile(file_path) to retrieve the content of a given file after running the plugins. However, a drawback of this approach is that if users generate the database/JSON files using the library but employ another tool to load the markdown file content.

rufuspollock commented 11 months ago

@mohamedsalem401 we aren't using plugins to transform markdown at all - we are using plugins to extract information from the markdown and then store that somewhere ...

See my last comment section about "Can you pass "data" along the chain of a plugin" ... because we just want to pass data along the chain. Or see the example above where it computes wordcount etc.

To repeat: we are not using remark plugins to transform the content but rather to extract information from it ...