Closed christophcemper closed 2 years ago
Hello Christoph.
maybe we discuss two things in your proposal separately:
Re: 1
The tool internally uses Node's path
module and vFile for dealing with paths. Without making any commitment yet I might be able to evaluate some kind of a URL or path template where placeholder names reflect vFile properties
Re: 2
glossarify-md parses input markdown files into a Markdown Abstract Syntax Tree (mdAST) and operates on that AST before serializing it back to Markdown text. So internally a link isn't represented by some textual pattern but by link
or link-reference
AST node types. When serializing these back to Markdown text a CommonMark serializer will serialize them to [ ]( )
for a link
and [ ][ ]
for a link-reference
. Other serializations require a Markdown Syntax Extension.
So bottom-line is: I may only be able to add some flexibility regarding URL / path construction but not regarding link syntax.
Hello Andreas,
Thanks for the concise response and sorry for the delay. I thought about your response and also kept working with my SED hacks, before thinking about responding in detail.
I agree with the separation into 2 requirements and actually have more details.
Re: 1 Directory Structure
1.1 Path structure of generated links to content (pages)
Indeed the mapping between the source file and the target platform (web server) may be different.
While this may not be obvious at first is a case that I also fix via SED in a multi-project environment, where several projects are maintained separately and then "merged" for hosting. I noticed you already worked on this for 6.0, thanks so much.
1.2 Path structure of generated links to assets like images, SVGs, etc.
While it seemed quite common to have assets like images in the current folder or e.g. a filename.assets subfolder of the current markdown file ("filename.md") there are special behaviors of CMS like HUGO
So given this original source structure
source
|-- chapter1
| |-- myarticle.md
| |-- myarticle.assets
| | |-- image.jpg
| |-- my-other-article.md
| |-- my-other-article.assets
| | |-- diagram.svg
1.2.1 Absolute references to a static folder with all images (which is uncool if you have 100s of pages and 1000s of images)
Would transform above example to, with image links being hardcoded to a static folder.
hugo-content-folder
|-- static
| |-- image.jpg
| |-- diagram.svg
|-- chapter1
| |-- myarticle.md
| |-- my-other-article.md
1.2.2. a "Page bundle" which is really a folder for the page, mimicking the behavior of Wordpress and other CMS with the images in it, and an index.md in it holding the actual markdown source.
Would transform above example to this "Hugo-style" page bundles
hugo-content-folder
|-- chapter1
| |-- myarticle
| | |-- index.md
| | |-- image.jpg
| |-- my-other-article
| | |-- index.md
| | |-- diagram.svg
Writing your content only in index.md files is not pleasant and your Git looks really weird, full of index.md files.
I first thought, that this "platform-specific transformation" would be out-of-scope.
But I now think that the transformation of image links and the respective binary files to another location is in-scope with glossarify-md as I would like to use it also for an images-index, an svg-diagram-index, etc. and obviously then glossarify-md would need to generate the right URLs right away.
The SED-approach right now caused some funny regex hours, but apparently "doing this right" would make more sense.
The perfect solution for me would be to have a source folder that I maintain as in the original example, and glossarify-md generates the image links and related asset references to a folder structure to be defined in the config.
Re: 2
I believe these kinds of path transformation for both content and images would still be possible leaving the idea for textual patterns for generating the link.
I am not yet convinced I want to address path transformations in glossarify because the tool is consistent about how it handles paths. It is actually Hugo which applies these transforms so it were HUGO's responsibility to do all the necessary path rewriting to maintain link stability. Likely that's what it invented its ref/relref syntax for. But maybe we are lucky that CommonMark seems to be more tolerant about a link nodes URL-part than I expected!
What I may be able to help you with is writing a plug-in for glossarify which you could then maintain for yourself. Basically a plug-in is a function like the one below. It returns a callback that when called get's passed the abstract syntax tree. Once you have access to the tree you can do a lot cool things. E.g. This one works (update: see comment):
import { visit } from "unist-util-visit";
/**
* Plugin to wrap markdown links into Hugo link syntax
*
* @type {import('unified').Plugin<[Options?]|void[], Root>}
*/
export default function remarkHugoLink(options = {}) {
return (tree) => {
return visit(tree, "link", (node) => {
node.url = `{{< rel ${node.url} >}}`;
return node;
}):
};
}
It'll take a link node's URL and wrap it into the syntax you desire. Note the options object which is how your plug-in would get passed its own config options. For other node types see mdast.
Here's what you could proceed with to create your first plug-in:
remark-hugo-links
and step intonpm init
package.json
and add
"type": "module",
"exports": "./index.js",
index.js
npm install unist-util-visit
You're now set with your plug-in. Next let's link the sources into the node_modules folder of your hugo project:
npm link
(creates a symlink in the global node_modules folder)npm link remark-hugo-links
(creates a symlink onto the global symlink)You now "installed" your plug-in. What's left is configuring glossarify-md to use it:
unified: {
"plugins": ["remark-hugo-links"]
}
If you succeeded familiarize yourself with publishing your first node package if you haven't already.
PS: Note that "List of Figures" won't link to the figure itself but to the markdown file in which a figure was referenced, e.g. through ![my figure](./figure.png)
is found.
BTW: you can install glossarify-md@6.0.0-alpha.3
running npm install glossarify-md@next
. But that shouldn't be a requirement to make above things work. Just in case you want to play with the new options.
But maybe we are lucky that CommonMark seems to be more tolerant about a link nodes URL-part than I expected! [...] E.g. This one works: [example...]
Unfortunately, the tree plug-in sample given above no longer works, because Hugo rel
and relref
syntax is not valid CommonMark syntax nor a valid subset of it. CommonMark v0.30 makes this easier to spot:
To recall: Hugo expects a link syntax
[Link Label]({{<ref destination>}})
Yet, the part between braces ()
constitutes a Link Destination according to the CommonMark spec:
A link destination consists of either
a sequence of zero or more characters between an opening < and a closing > that contains no line endings or unescaped < or > characters, or
a nonempty sequence of characters that does not start with <, does not include ASCII control characters or space character, and includes parentheses only if [...]
The conflict: Hugo's syntax requires
{{
and end with }}
=> not valid with respect to CommonMark requirement 1ref
and the actual destination => not valid with respect to CommonMark requirement 2The space character restriction has already been present in CommonMark v0.29 but easy to read over. mdast-util-to-markdown
implemented a fix, recently, becoming more strict and compliant with the spec. The library is contributing serialization of Markdown AST-nodes to text.
With the fix in place a link
node's link destination URL will now be tested against the specified ASCII control character range. Once the serializer finds such a character it considers the link destination to be invalid with respect to CommonMark requirement 2. It then chooses to make it a valid destination according to requirement 1 by wrapping it into angle brackets and escaping any further occurrences of those in between. The result of the sample plug-in, proposed earlier, will then change to:
[Link label](<{{\<ref URL \>}}>)
with
{{ }}
being wrapped into outer angle brackets <
and >
TL;DR: it is no longer sufficient anymore to write a tree plug-in. Tackling custom Markdown Syntaxes requires advanced programming skills. I would like to point the advanced reader to micromark here.
However, the larger situation is: many people and static side renderers are keen on putting their efforts into inventing arbitrary syntaxes and labeling them Markdown. Yet there is no standardization effort being able to keep pace with these claimed Markdown extensions. Unless this changes, compatibility and interoperability across tools will remain limited to a hard core like CommonMark. Any non-standard syntax may be subject to the unwritten law of popularity and an ecosystem of parsers and processors coming to agree on additional syntax.
What's your user story?
As a Hugo user I want to customize the output of the urls in order to make the links work Hugo natively and avoid regex/sed work afterwards
Instead of a plain
some/path/file.md
it should be a markdown links with a relrefe.g.
[link to the page via a *relref*]({{< relref some/path/file >}})
How would you like the system behave to satisfy your needs? How would you like it not?