about-code / glossarify-md

A Term-to-Definition-Linker for Markdown. https://npmjs.com/package/glossarify-md
MIT License
30 stars 9 forks source link

Generate ref or relref for Hugo, not just file references #165

Closed christophcemper closed 2 years ago

christophcemper commented 3 years ago

What's your user story?

As a Hugo user I want to customize the output of the urls in order to make the links work Hugo natively and avoid regex/sed work afterwards

Instead of a plain some/path/file.md it should be a markdown links with a relref

e.g.

[link to the page via a *relref*]({{< relref some/path/file >}})

How would you like the system behave to satisfy your needs? How would you like it not?

  1. In the config I want to specifc a "link template string" to customize the output
  2. Example config could be
    "linkTemplate": "[$title]({{< relref $filepath/$filenamewithoutextension >}})"
about-code commented 3 years ago

Hello Christoph.

maybe we discuss two things in your proposal separately:

  1. for one you seem to have a need of tweaking link paths which currently reflect the input directory structure
  2. you seem to have a need of a custom link syntax which you would like to provide using a link template

Re: 1 The tool internally uses Node's path module and vFile for dealing with paths. Without making any commitment yet I might be able to evaluate some kind of a URL or path template where placeholder names reflect vFile properties

Re: 2 glossarify-md parses input markdown files into a Markdown Abstract Syntax Tree (mdAST) and operates on that AST before serializing it back to Markdown text. So internally a link isn't represented by some textual pattern but by link or link-reference AST node types. When serializing these back to Markdown text a CommonMark serializer will serialize them to [ ]( ) for a link and [ ][ ] for a link-reference. Other serializations require a Markdown Syntax Extension.

So bottom-line is: I may only be able to add some flexibility regarding URL / path construction but not regarding link syntax.

christophcemper commented 3 years ago

Hello Andreas,

Thanks for the concise response and sorry for the delay. I thought about your response and also kept working with my SED hacks, before thinking about responding in detail.

I agree with the separation into 2 requirements and actually have more details.

Re: 1 Directory Structure

1.1 Path structure of generated links to content (pages)

Indeed the mapping between the source file and the target platform (web server) may be different.

While this may not be obvious at first is a case that I also fix via SED in a multi-project environment, where several projects are maintained separately and then "merged" for hosting. I noticed you already worked on this for 6.0, thanks so much.

1.2 Path structure of generated links to assets like images, SVGs, etc.

While it seemed quite common to have assets like images in the current folder or e.g. a filename.assets subfolder of the current markdown file ("filename.md") there are special behaviors of CMS like HUGO

So given this original source structure

source
|-- chapter1
|    |-- myarticle.md
|    |-- myarticle.assets
|    |    |-- image.jpg
|    |-- my-other-article.md
|    |-- my-other-article.assets
|    |    |-- diagram.svg

1.2.1 Absolute references to a static folder with all images (which is uncool if you have 100s of pages and 1000s of images)

Would transform above example to, with image links being hardcoded to a static folder.

hugo-content-folder
|-- static
|    |-- image.jpg
|    |-- diagram.svg
|-- chapter1
|    |-- myarticle.md
|    |-- my-other-article.md

1.2.2. a "Page bundle" which is really a folder for the page, mimicking the behavior of Wordpress and other CMS with the images in it, and an index.md in it holding the actual markdown source.

Would transform above example to this "Hugo-style" page bundles

hugo-content-folder
|-- chapter1
|    |-- myarticle
|    |    |-- index.md
|    |    |-- image.jpg
|    |-- my-other-article
|    |    |-- index.md
|    |    |-- diagram.svg

Writing your content only in index.md files is not pleasant and your Git looks really weird, full of index.md files.

I first thought, that this "platform-specific transformation" would be out-of-scope.

But I now think that the transformation of image links and the respective binary files to another location is in-scope with glossarify-md as I would like to use it also for an images-index, an svg-diagram-index, etc. and obviously then glossarify-md would need to generate the right URLs right away.

The SED-approach right now caused some funny regex hours, but apparently "doing this right" would make more sense.

The perfect solution for me would be to have a source folder that I maintain as in the original example, and glossarify-md generates the image links and related asset references to a folder structure to be defined in the config.

Re: 2

I believe these kinds of path transformation for both content and images would still be possible leaving the idea for textual patterns for generating the link.

about-code commented 3 years ago

I am not yet convinced I want to address path transformations in glossarify because the tool is consistent about how it handles paths. It is actually Hugo which applies these transforms so it were HUGO's responsibility to do all the necessary path rewriting to maintain link stability. Likely that's what it invented its ref/relref syntax for. But maybe we are lucky that CommonMark seems to be more tolerant about a link nodes URL-part than I expected!

What I may be able to help you with is writing a plug-in for glossarify which you could then maintain for yourself. Basically a plug-in is a function like the one below. It returns a callback that when called get's passed the abstract syntax tree. Once you have access to the tree you can do a lot cool things. E.g. This one works (update: see comment):

import { visit } from "unist-util-visit";

/**
 * Plugin to wrap markdown links into Hugo link syntax
 *
 * @type {import('unified').Plugin<[Options?]|void[], Root>}
 */
export default function remarkHugoLink(options = {}) {
  return (tree) => {
    return visit(tree, "link", (node) => {
      node.url = `{{< rel ${node.url} >}}`;
      return node;
    }):
  };
}

It'll take a link node's URL and wrap it into the syntax you desire. Note the options object which is how your plug-in would get passed its own config options. For other node types see mdast.

Here's what you could proceed with to create your first plug-in:

  1. Make a new directory remark-hugo-links and step into
  2. npm init
  3. Open your package.json and add
    "type": "module",
    "exports": "./index.js",
  4. Copy above plug-in code into index.js
  5. npm install unist-util-visit

You're now set with your plug-in. Next let's link the sources into the node_modules folder of your hugo project:

  1. yet within remark-hugo-links run npm link (creates a symlink in the global node_modules folder)
  2. cd into your Hugo project
  3. npm link remark-hugo-links (creates a symlink onto the global symlink)

You now "installed" your plug-in. What's left is configuring glossarify-md to use it:

  1. Add to your glossarify-md.conf.json (or read https://www.npmjs.com/package/glossarify-md#markdown-syntax-extensions)
    unified: {
       "plugins": ["remark-hugo-links"]
    }
  2. Run glossarify and see whether link output changed to your needs.

If you succeeded familiarize yourself with publishing your first node package if you haven't already.


PS: Note that "List of Figures" won't link to the figure itself but to the markdown file in which a figure was referenced, e.g. through ![my figure](./figure.png) is found.

about-code commented 3 years ago

BTW: you can install glossarify-md@6.0.0-alpha.3 running npm install glossarify-md@next. But that shouldn't be a requirement to make above things work. Just in case you want to play with the new options.

about-code commented 2 years ago

But maybe we are lucky that CommonMark seems to be more tolerant about a link nodes URL-part than I expected! [...] E.g. This one works: [example...]

Unfortunately, the tree plug-in sample given above no longer works, because Hugo rel and relref syntax is not valid CommonMark syntax nor a valid subset of it. CommonMark v0.30 makes this easier to spot:

To recall: Hugo expects a link syntax

[Link Label]({{<ref destination>}})

Yet, the part between braces () constitutes a Link Destination according to the CommonMark spec:

A link destination consists of either

  1. a sequence of zero or more characters between an opening < and a closing > that contains no line endings or unescaped < or > characters, or

  2. a nonempty sequence of characters that does not start with <, does not include ASCII control characters or space character, and includes parentheses only if [...]

The conflict: Hugo's syntax requires

  1. a link destination to begin with {{ and end with }} => not valid with respect to CommonMark requirement 1
  2. a space between ref and the actual destination => not valid with respect to CommonMark requirement 2

The space character restriction has already been present in CommonMark v0.29 but easy to read over. mdast-util-to-markdown implemented a fix, recently, becoming more strict and compliant with the spec. The library is contributing serialization of Markdown AST-nodes to text.

With the fix in place a link node's link destination URL will now be tested against the specified ASCII control character range. Once the serializer finds such a character it considers the link destination to be invalid with respect to CommonMark requirement 2. It then chooses to make it a valid destination according to requirement 1 by wrapping it into angle brackets and escaping any further occurrences of those in between. The result of the sample plug-in, proposed earlier, will then change to:

[Link label](<{{\<ref URL \>}}>)

with

TL;DR: it is no longer sufficient anymore to write a tree plug-in. Tackling custom Markdown Syntaxes requires advanced programming skills. I would like to point the advanced reader to micromark here.

However, the larger situation is: many people and static side renderers are keen on putting their efforts into inventing arbitrary syntaxes and labeling them Markdown. Yet there is no standardization effort being able to keep pace with these claimed Markdown extensions. Unless this changes, compatibility and interoperability across tools will remain limited to a hard core like CommonMark. Any non-standard syntax may be subject to the unwritten law of popularity and an ecosystem of parsers and processors coming to agree on additional syntax.