Allow Hugo to parse "plain" Markdown

lucperkins commented 5 years ago

I work on a lot of OSS documentation projects, and I strive to use Hugo in all of them (I can point to 20+ projects that I've used Hugo for). One of the downsides of Hugo is this: many people want to be able to use a directory full of "bare" Markdown in conjunction with Hugo, but they can't, which means that they need to provide some kind of "bridging" solution that converts that Markdown into something consumable by Hugo.

By "bare" Markdown I mean Markdown with:

No page metadata specifying the title, params, tags, etc.
Links to /docs/foo.md rather than /docs/foo (because the links are to Markdown pages, not rendered HTML pages).

So what usually happens is that people create some kind of build pipeline using tools like Gulp.js that adds page metadata, converts links, etc. It would be fantastic if Hugo provided you the option to:

Derive the document title from, say, the first <h1> or maybe the first header.
Automatically convert /foo.md links to /foo (or even make the .md configurable).

Imagine the ability to do this (just spitballing here):

hugo server --convertLinks "md" --inferTitle --source my-github-repo

As with issue #6095, I'd be happy to take on this work if others think it's a good idea. I'm not sure how useful this would be to others. I do know that it would be immensely useful to me and I can think of many projects that would benefit.

onedrawingperday commented 4 years ago

Basically this issue is about allowing markdown content without front matter and I also think that this would be a very useful feature in Hugo.

lucperkins commented 4 years ago

@onedrawingperday Yes, that's precisely it. I work on many documentation projects in which the front matter contains nothing but a title, which could be inferred according to a hierarchy like this:

Front matter
If not in front matter, use first header
If no headers, use filename

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

DewofyourYouth commented 4 years ago

sounds doable. i'll take a look at it.

bep commented 4 years ago

@DewofyourYouth I think we need to talk a little about the "what" here first.

This issue is a little bit old. From when @lucperkins created it I think the core of the problem is already solved, see https://github.com/bep/portable-hugo-links

What's left in the above is the document title.

Derive the document title from, say, the first <h1> or maybe the first header.

Which would mean a repeated value, not really useful. Most Markdown docs start at h2.

lucperkins commented 4 years ago

@bep In my experience, most people use an h1 at the very top of the document to specify the title when they’re not specifying it via metadata.

DewofyourYouth commented 4 years ago

Yeah, I was going to ask that the issue seemed to be mostly solved. OK, so if I understand correctly, what we might want to do is just to check for the first h1 on docs without frontmatter - and if there is one to use it as the title?

lucperkins commented 4 years ago

@DewofyourYouth I’m more proposing a render hook that would enable you to potentially determine the title via something that isn’t front matter. So you could have logic like “if the title is set via front matter use that, otherwise use the first h1 in .Content, otherwise use a titleized version of the filename.”

DewofyourYouth commented 4 years ago

OK. Sounds like a plan. @bep does that work? @lucperkins could you share an example of one of these gulpfiles you were talking about so I can get a sense of what kind of edge cases I should be thinking of? (If not I'll just use my imagination or find one.)

lucperkins commented 4 years ago

@DewofyourYouth Sadly no, as I’ve put in a bajillion hours to successfully eliminate them 😀 But I think you fully grasp the problem already. Those scripts really only did two things, changing /foo.md to /foo and inserting front matter. The first has been solved via render hooks, and allowing for zero front matter would solve the second one.

So yeah, earlier in this thread I was envisioning something more hardcoded (“infer title via first header”) but later in the thread I realized that render hooks could enable you to infer the title however you like. And maybe even allow you to infer other front matter items (date, weight, etc) as well.

bep commented 4 years ago

OK. Sounds like a plan. @bep does that work?

We currently parse the metadata early and the content lazily and only if it's used. I'm not giving up on that for this feature, so we need to think about it. I have my plate filled with other stuff, so this needs to wait.

ghost commented 4 years ago

I would absoluteley appreciate this feature!

edrex commented 4 years ago

I would like to use hugo for rendering my plain markdown notes/journals. My current experience is that plain markdown pages are rendered (at least with my config) but don't show up in lists.

We currently parse the metadata early and the content lazily and only if it's used.

Makes sense. How about just falling back to a title-cased version of the filename (eg "the-story-of-pie" => "The Story of Pie"), for the moment? @bep

I'd be happy to have a go at impl if so.

hi @lucperkins :)

edrex commented 4 years ago

don't show up in lists

Actually, they do, it's just that they

a. have no title so the <a doesn't render b. are out of order since they don't have a date

I was able to get entries without a title to show by replacing {{ .Title }} with {{ .Title | default .File.BaseFileName }} in my list.html.

The other barrier to this being useful is lack of date ordering. Not sure if this can be worked around in templates.

onedrawingperday commented 4 years ago

Apparently it is possible to make Hugo parse Markdown without front matter by using Front Matter Cascade

See this forum post and the one below for the details: https://discourse.gohugo.io/t/what-is-required-to-make-markdown-get-picked-up/24191/8

cc: @bep

kaihendry commented 4 years ago

I noticed in v0.71.1/extended that imported markdown without front matter had dates like January 1, 0001 despite enableGitInfo: true being set in Hugo's config.yaml. :(

rigtorp commented 4 years ago

We currently parse the metadata early and the content lazily and only if it's used. I'm not giving up on that for this feature, so we need to think about it.

For speed hugo should load a multiple of the page size from the beginning of file using a single syscall. Loading a partial page is inefficient since the page cache would pull in the whole page anyways. On most systems page size is 4096. So load 4096 from beginning of file. If it contains a complete front matter, parse it, if title is missing and the first 4096 also contains a markdown header use the header as title; otherwise go to slow path of loading more data. So this feature should only add a very small CPU overhead and no IO overhead.

zyansheep commented 1 year ago

I'm thinking of making a plugin to render mdBook repositories, but for mdbook, the sidebar is defined by a SUMMARY.md file in the root of the book. However, I need to somehow use that summary to figure out what the order of pages are to implement the previous and next page buttons (and the search function). Would this feature allow for something like this?

city-dream-ua commented 1 year ago

Who knows where this is done in code? I mean, where the entry point of parsing whole document. I looking into hugo codebase right now, but it seems hard to me find the function that accept path to the file (or content []byte) and return something like Hugo Page object with front matter and content.

gohugoio / hugo

Allow Hugo to parse "plain" Markdown #6098