gma / nesta

File Based CMS and Static Site Generator
http://nestacms.com
MIT License
902 stars 121 forks source link

Override / define page heading using metadata #104

Closed ashmckenzie closed 12 years ago

ashmckenzie commented 12 years ago

Hi,

I had a few pages where I really didn't want to define a %h1 in order to set the menu heading, so I thought it would be a good idea to support this using just metadata.

Thanks :)

Ash.

gma commented 12 years ago

Interesting, thanks.

What's the use case for a page that doesn't have a heading though? I'm not sure that I've ever made one…

ashmckenzie commented 12 years ago

I literally have a two page site with one being the home page and another being a 'Contact us' form so neither of them really need a h1 to summarise their content. I could just CSS display: none the h1 I guess, but that feels icky. Also, I know it's good SEO to have h1's.. it's just hard to justify for two pages!

gma commented 12 years ago

Okay, I see what you mean visually. I'm wondering about accessibility; what might a screen reader make of a page without a heading. Or rather, wouldn't it help accessibility, even if you hide it with CSS? I'm not sure I feel that it's icky; it might actually be a better approach…

ashmckenzie commented 12 years ago

Very good points, I cannot argue with that :) Happy to close the pull request..

gma commented 12 years ago

Okay, nice one. :-)

MicahChalmer commented 12 years ago

Doh! I was going to write up this exact suggestion myself...I'm sad to see it rejected here. I actually put a monkey patch almost exactly like this commit in my site's app.rb myself, intending to propose it as a change later the way @ashmckenzie just did, because I was using a div rather than a h1 for headings on the home page (as opposed to individual article pages.) Even if I made the divs into h1s, I'd still need to style them specially, at which point I'd run into issue #78.

I think having the heading in metadata is simpler and better than extracting it from the content. Look at all the issues that heading extraction is at least partially causing:

Of course, all existing sites rely on the current behavior, so you can't remove it. Instead, I suggest that you merge @ashmckenzie's commit here, and keep the current header extraction but declare that it's for haml, markdown and textile only. Then proceed to add other formats, such as erb, without any header extraction. People who want to use the additional formats can put the heading in metadata, and everyone's existing site won't break. You could put something along the lines of:

%h1= @page.heading

= @page.body(self)

in the default page template instead of @page.to_html(self) so that your heading would still be displayed in a h1 by default, whether it was extracted from the content or defined in metadata.

@gma, if you think you might be willing to merge it, I can take a shot at actually doing this myself. I don't want to try it if I'd just be creating a code bomb that has no chance of being accepted. Please let me know what you think.

gma commented 12 years ago

Where to start?! ;-)

Are you suggesting that every page (in erb, if it were supported) ought to specify the heading metadata, and that Nesta shouldn't infer what Page#heading should do based on the page content? I'm not keen on that, or on having different formats exhibit different behaviour.

I'll play devil's advocate on #78; we could fix that just by improving the regex, so I don't consider it to be a significant force on the design. If it was really a big deal, surely somebody would have sent in a pull request with a more capable regex in it by now? Fixing it in Haml would be relatively simple, and then fans of Textile and Markdown could both use it via Haml's filters.

84 is a little different; the Page#heading method needs to return raw text with no tags in, and I feel this ticket is more about adding API for getting a marked up version of the heading. If the parsing in #78 were improved, we'd need a new method to fix #84 (or an optional argument to Page#heading).

73 and #74 are more architectural; Sinatra's find_template method that Nathanael points out in #73 looks like the support Nesta needs to simplify the rendering helpers. I'm not seeing what it's got to do with this heading metadata stuff though. No matter how headings are specified, people would still need to be able to get a page's heading with markup, and without. If we asked Tilt to render stuff, we will get to choose which parts of a page's markup we hand to it, so I can't see the use of Tilt being a problem either way.

For #95, I've decided that I'm not going to encourage people to put erb files inside content/pages. That's not what content management is about. I look at Haml a bit differently because you can use filters to write most of a Haml file in Markdown or Textile, but I thought long and hard before adding it. That puts #95 squarely in the domain of being a plugin (which I'd be happy to see; I just don't want to bless the idea by putting it in core).

I do wonder if I've still not quite appreciated why you'd have a page without an h1 tag at the top of it. I don't want to make that easy to do; if you really don't want an h1 tag that's fine, but Nesta shouldn't let that happen accidentally…

MicahChalmer commented 12 years ago

Thanks for the response.

Before my reply, let me say one thing: I don't actually think this issue is that big of a deal! It certainly isn't interfering substantially with my use of Nesta. This is just one of those maddening things that sit very close to the line between aesthetics and functionality. If I don't write this reply it'll keep rattling around in my brain, even though I know it's not worth all the verbiage I'm throwing at it. So I thank you for your patience, apologize for the length of this reply, and promise that I will not belabor it any further if, as I anticipate, this fails to change your mind. :-)

I'll play devil's advocate on #78; we could fix that just by improving the regex...

Yes, the specific problem in #78 could be fixed in the regex. So could the multi-line header issue pointed out in a comment towards the bottom of #57. #53 is another example of the same kind of problem, and that one was fixed when it was noticed. But I think that all of these point to a general concept question: is the heading is really part of the markdown/textile/haml content on a page, or not?

When I started to use Nesta, I had a mental model of a page file in which the file has two sections: metadata and content. The metadata is parsed by Nesta, whereas the content is parsed with markdown/textile/haml. In this mental model, the heading is part of the content, just like the body is. From what I can tell, that's the mental model that you want people to have.

The trouble is, thinking with that mental model isn't sufficient to know how my Nesta site will behave. That model doesn't tell me that I can't use multi-line heading syntax for the first heading, or that I'm required to have the heading at all, or that it has to be plain text. Once I learn all this, my mental model changes. I now think of a Nesta page as having three sections:

  1. the metadata
  2. the heading, which is parsed by Nesta with a regex, but also run through the format parser. This is required to be present, and required to be in plain text. Its syntax is format-specific and to know how it works you really have to just look at the regex in Nesta's code.
  3. the real content, which is just run through the markdown parser. In here I can do anything allowed in markdown.

Having that middle section in my mental model of a page file is what I don't like. Each specific problem this creates is small, and generally easier to sidestep than to fix, thus nobody has sent you pull requests to fix them. But to me, it still makes it feel less pleasant to use (and I say that against a background of Nesta being very pleasant to use in the first place--that's why I'm using it.)

If you think of the heading as being part of the metadata instead, and allow (but don't require) it to be specified there, then this restores the simpler two-section model, with no edge cases to worry about. Having that simpler mental model just feels better to me, because I don't have to worry about a line of my file being in effectively two formats at once.

This is my real reason for wanting this change, and why I'm bothering to write about at length (against my better judgement--can't--stop---aaaaa!)

Are you suggesting that every page (in erb, if it were supported) ought to specify the heading metadata, and that Nesta shouldn't infer what Page#heading should do based on the page content?

I'm suggesting that every page should be able to specify its heading in metadata, with Nesta inferring it based on the content if it doesn't.

I'm not keen on that, or on having different formats exhibit different behaviour.

It's the heading extraction that necessarily has different behavior for different formats. Heading in the metadata is format-neutral.

I do wonder if I've still not quite appreciated why you'd have a page without an h1 tag at the top of it.

There are two reasons not to have an h1 mentioned above: Ash's reason (he doesn't want a header on his pages at all) and my reason (I want my "headers" to be surrounded with <div class="box-title"> instead of <h1>.) I'll grant that neither of these may be all that common, and I wouldn't blame you if you didn't want to bother to support them. For me, this isn't the core reason to put heading in metadata.

I don't want to make that easy to do; if you really don't want an h1 tag that's fine, but Nesta shouldn't let that happen accidentally…

If you don't want to make it easy to leave out the h1, all the more reason to put that tag right in the default page.haml template, instead of making Nesta demand that it be specifically placed in every page's content. Once you do that, a site would need to substitute its own page template in order to leave it out. Someone like me, who doesn't want it, is almost certainly using their own page template anyway.

84 is a little different; the Page#heading method needs to return raw text with no tags in, and I feel this ticket is more about adding API for getting a marked up version of the heading.

]The only reason I thought this relevant here is that, with the heading in the metadata, it could have HTML markup in it without having to do something like re-invoke the format parser. There's more detail around that, but such discussion would belong in a comment on #84 itself (and is moot unless heading in metadata is accepted.)

For #95, I've decided that I'm not going to encourage people to put erb files inside content/pages...That puts #95 squarely in the domain of being a plugin...

Fair enough. It occurs to me that even if you don't put heading in metadata into core, nothing would prevent a page format plugin from deciding that its way of determining the heading is to look for it in metadata, even if core doesn't. So #95 isn't really as relevant to this as I thought.

73 and #74 are more architectural...

Oops, my mistake. I thought, while writing my original comment, that these two issues were about finding pages, but they're really about finding templates and layouts. You're right, they have nothing to do with this.

gma commented 12 years ago

I thank you for your patience, apologize for the length of this reply, and promise that I will not belabor it any further if, as I anticipate, this fails to change your mind. :-)

Not at all - I really appreciate you putting it all down. For the most part I'm largely in the dark about how the docs and code come across to people using Nesta. I get most of my insight and feedback on what's easy, difficult or unexpected from friends who start using it, so this is really valuable.

When I started to use Nesta, I had a mental model of a page file in which the file has two sections: metadata and content. The metadata is parsed by Nesta, whereas the content is parsed with markdown/textile/haml. In this mental model, the heading is part of the content, just like the body is. From what I can tell, that's the mental model that you want people to have.

It's what I'd expect people to imagine. If a Ruby developer handed a site built in Nesta to a non-developer friend or colleague (perhaps one who likes iA Writer) they should be able to create pages on that site without doing anything other than writing Markdown files. I know we don't support the Markdown heading format with a row of '=' characters, but Nesta isn't alone there. And like you say, we could easily fix that.

Nesta would use the Markdown/Textile/Haml parser to parse the page contents and then extract the first h1 tag, were it not for the fact that it would introduce a performance overhead.

I now think of a Nesta page as having three sections: 1. the metadata, 2. the heading, which is parsed by Nesta with a regex... and 3. the real content, which is just run through the markdown parser

Having that middle section in my mental model of a page file is what I don't like.

I agree, I'd like it to behave exactly as you'd expect here too. It's less important than other factors to me though, that's all.

If you think of the heading as being part of the metadata instead, and allow (but don't require) it to be specified there, then this restores the simpler two-section model, with no edge cases to worry about. Having that simpler mental model just feels better to me, because I don't have to worry about a line of my file being in effectively two formats at once.

I agree with everything you've said so far; it makes bits of Nesta cleaner.

I'm suggesting that every page should be able to specify its heading in metadata, with Nesta inferring it based on the content if it doesn't.

Allowing people to put the heading in the metadata would result in pages like this:

Heading: The heading

Blah blah blah...

That page would have no h1 tag, and there's no good reason not to have an h1 tag (it's not a design issue). Nesta should help people out with stuff like that. I have friends who use Nesta who need help with things like that.

My problem with that previous example is that it suggests the user hasn't fully understood the best way forward, and they should really have written it like this:

# The heading

Blah blah blah...

That reason alone is enough to stop "Heading" going in as an optional metadata key.

I'm not a believer in there being multiple obvious ways to do things; it causes confusion for beginners, costs time/money and makes code more difficult to maintain. Users shouldn't have to read about specifying something as important as the page heading with metadata keys unless it's really needed, and I think we can code around the problem to produce a more consistent UI.

For me, decisions like this are primarily UI choices. If we put enough effort in we can make the code do pretty much anything, but its not easy to improve a bad UI.

When thinking about this kind of thing I try and put myself in the shoes of a non-technical user of Nesta, and wonder what they'd infer from the contents of a Markdown file. I think about whether or not I'd be able to explain it clearly in the docs, and when I can't that's usually a sign that something isn't quite right.

When thinking of it as UI, being able to write this seems very weird:

Heading: The heading (option 1)

# The heading (option 2)

Blah blah blah...

Which heading would appear on the page would depend on what's in your template; the one you proposed would use "option 2" and all existing themes (with new behaviour in Page#heading) would use "option 1".

As an end user, I'm not sure that I know what I'd expect Nesta to do under those circumstances; I'd need to know how it was implemented, and that means that three months after patching it, I'd have to keep opening the code to check.

Given that "heading" means "headline for the page" I wonder whether or not we're talking about the wrong thing. The heading method isn't used to render the current page at all; the whole thing is handed to the relevant parser.

Nesta only needs a plain text version of the heading so that it can link to the page with some meaningful link text (I should check on that, but I'm fairly sure that's what it does with it).

Adding "Link text" metadata and then making the heading parsing more intelligent appeals to me more. If Nesta's regex doesn't find a heading at the top of a page, then it could fall back to reading "Link text". A lack of "Link text" or regex-parseable heading would raise a meaningful exception.

Page#heading would need updating, but that might be all.

If you like the idea don't leap into a patch just yet; I'm now wondering how I feel about a method called "heading" returning the contents of metadata called "Link text"! Maybe Page#heading also has the wrong name, as it isn't used to display the heading when rendering a page (that job is handed off to the Markdown/Textile/etc. parser).

If you don't want to make it easy to leave out the h1, all the more reason to put that tag right in the default page.haml template, instead of making Nesta demand that it be specifically placed in every page's content.

The default templates are just like stabiliser wheels; as soon as you get going you'll take 'em off. The Nesta API should be responsible for encouraging/enforcing it.

Cheers for writing that message; my thinking has advanced significantly while writing this reply. Let me know what you think of the "Link text" idea. I think it's basically what you're arguing for with a different metadata key, and no requirement to change any templates.

MicahChalmer commented 12 years ago

Cheers for writing that message; my thinking has advanced significantly while writing this reply. Let me know what you think of the "Link text" idea. I think it's basically what you're arguing for with a different metadata key, and no requirement to change any templates.

I agree. Let me ask about a few specifics:

Adding "Link text" metadata and then making the heading parsing more intelligent appeals to me more. If Nesta's regex doesn't find a heading at the top of a page, then it could fall back to reading "Link text". A lack of "Link text" or regex-parseable heading would raise a meaningful exception.

I think the fallback should go the other way--explicit overrides implicit. Thus, given this page:

Link text: AAA

# CCC

BBB

I think it should show "AAA" on the links, not "CCC." On the page itself, "CCC" would be in the h1, and "AAA" would not appear at all.

I can also observe that despite your distaste for this, your proposal would allow pages without an h1 easily enough:

Link text: AAA

BBB

Nesta only needs a plain text version of the heading so that it can link to the page with some meaningful link text (I should check on that, but I'm fairly sure that's what it does with it)

The other issue is what Page#body does. I would expect it to return "BBB" in both of my example pages above.

I'm now wondering how I feel about a method called "heading" returning the contents of metadata called "Link text"! Maybe Page#heading also has the wrong name, as it isn't used to display the heading when rendering a page (that job is handed off to the Markdown/Textile/etc. parser).

How about this: add a Page#link_text method that returns the new "Link text" metadata if it's specified, or calls Page#heading if not. Page#heading continues to do the same thing it does with h1 tags today, and ignores "Link text" metadata. So calling Page#heading on my first example would return "CCC", and calling it on my second h1-less example would return nil. The menu renderer and the default template would change to use link_text where they now use heading. These changes would only make a difference for pages with link text metadata specified, so all existing pages and templates would continue to work as they do now (even if they use heading), but heading wouldn't have bizarre behavior given its name.

ms commented 12 years ago

Really late (didn't have time to read the walls of text you guys produce!) but I'd like to give my opinion, hopefully succinctly.

I don't really think we should make it easy not to have an h1 tag either. I feel all pages should have an h1, even if you end up hiding it, just so that the structure of the page looks good.

I think the ability to give an alternative string for links and bread crumbs makes a lot of sense. Sometimes, the title will be very long (multiline), have important non "pure ASCIIable" data (different fonts, different colors, images, italics, etc.) and we should let the user give an alternative title for pure text only places (menu, breadcrumbs, etc.) This is, BTW, similar to what LaTeX lets users do in \section[Short title]{Long title}, where the Long Title is printed on the page and the Short Title is used in the table of contents, page headers, etc.

But the most important issue here is the approach of trying to parse the HAML/MD/etc. content to find the title if we don't have another way to find a title (ie the status quo). It's simply impossible to parse arbitrary (x)HTML with a regex. I don't imagine it's possible to parse HAML either.

Just think about the crazy amount of attributes and tags the user can put in an <h1>! Just for fun: <h1 id="thetitle" class="awesome_tags" style="font-size: 13.42pt;">This is my <em>title</em>.<br />I love <a href="http://google.com/">G<strong>oo</strong>gle</a>!</h1>. How do we parse a monster like this? Ignoring the attributes is easy enough, I guess (except you can probably escape the > in the tag and we wouldn't detect that!) but the nested tags make it simply impossible. But thankfully we have XML parsers which know how to do that very well. I don't know what the performance hit that @gma mentioned is, but this is the only clean way I see.

I don't know that I have a good conclusion here. But it seems to me the only practical and somehow clean algorithm would be:

  1. Did the user define an (HTML free!?) Link Text? Forget the content, use that. [Fastest]
  2. Else if we can we parse the title from the content with a super simple regex that assumes no attributes or nested tags use that. [Pretty fast]
  3. Else, fallback to sending the HAML/erb/MD to the appropriate parser and use nokogiri/hpricot to parse the HTML we get back. If we still can't find the title (no h1 tag whatsoever), error. [Slowest]

Why is it clean?

  1. A new user writing a simple page will use case 2, which is the status quo, and there will be no surprise
  2. Somebody like @MicahChalmer who wants a different link text will have that, by adding an optional metadatum and using case 1.
  3. Finally somebody that writes complex pages but doesn't know he or she should be specifying a link text for performance reasons will still see correct output, if slower (but we need hard numbers to comment further) by using case 3.

Thoughts?

gma commented 12 years ago

Sorry I haven't replied for a week…

the fallback should go the other way--explicit overrides implicit

@MicahChalmer - agreed; I'm not sure how I managed to write the exact inverse of that, as I didn't mean it! I didn't mean to suggest that we'd allow pages not to have an h1 tag either.

Having said that, while investigating @jschoolcraft's menu problem this morning, I noticed that Nesta has allowed him to create lots of pages without a heading tag. This tallies with @ms's latest comment on #89, though I've no idea now why it silently allows it at the moment. I can't remember the exact circumstances in which Nesta complained if a page didn't have a heading set, but I think I'll make it raise an exception in the next release (which would clear up #89 too).

But the most important issue here is the approach of trying to parse the HAML/MD/etc. content to find the title if we don't have another way to find a title…

@ms - I like your 1, 2, 3 summary. Option 3 seems user friendly, but by allowing people to implement sites like that we'd be allowing them to implement very slow sites. By supporting it, I'd consider us to be giving it our blessing (which I don't). So just 1 and 2 would cover it. Anything that tries to call link_text or heading should raise LinkTextNotSet or HeadingNotSet exceptions, including the path of the page (relative to the content/pages folder) of the offending page in the exception's message.

If anybody fancies working on this, pull requests welcome. Perhaps it's time I wrote a short code style guide, but I'll put the main things in here now incase anybody fancies starting it…

Cheers – I think we're getting somewhere on this one…