dweinberger / newclues

56 stars 19 forks source link

Leverage Jekyll to store clue information only once #4

Closed benbalter closed 9 years ago

benbalter commented 9 years ago

Problem

Right now, the clues are stored simultaneously in seven different places (Three times in HTML, twice in JSON, once in XML, and once in plain text). That means that if someone proposes a change, as @itraor did in https://github.com/dweinberger/newclues/pull/2, you can't easily see what was changed (the diff). Additionally, it makes it harder for casual contributors to propose changes, and increases the risk that the various representations may get out of sync. Not to mention, I have to imagine it's a lot of work to maintain.

Proposed solution

This still-work-in-progress pull request is a quick proof of concept to show how one might leverage Jekyll to automatically compile the various representations on each push, commit, or merge. Jekyll is an open source templating engine that can be run locally to generate the static HTML site, but it is also baked directly into GitHub and fires automatically whenever a change is made.

Under the hood

Storing the data

Because we want to represent the clues in machine readable formats (e.g., XML, JSON), we're forced to store the data in a machine readable format. Luckily Jekyll speaks some pretty human-friendly machine-readable formats (including CSV and JSON). Here I used YAML. You can see the chapters, subheads, and clues stored in this .yml file. This would be the canonical copy of the clues. We could do the same with the preamble as well. You'll notice YML is simply key/value pairs separated by a colon, and arrays (lists) demarcated by a dash at the beginning of the line.

Publishing the clues

Once the data is read in by Jekyll, it's easy to publish out in any format we want using simple templates. Here's the index file template for example. You'll see it simply loops through each clue:

<ol class="subheads">
  {% for subhead in chapter.subheads %}
    <li class="subhead">{{ subhead.title }}
      <ol class="clues" start="{{ cluenum }}">
        {% for clue in subhead.clues %}
          <li class="clue">
            {{ clue | markdownify }}
          </li>
        {% endfor %}
      </ol><!-- clues -->
    </li><!-- subhead -->
  {% endfor %}
</ol><!-- subheads -->
{% endfor %}

That same process could be used to generate JSON, XML, OPML, or even plain text.

Note the output should be nearly identical. Again, the big win here is that we store the clues once (the YAML), and then publish it everywhere. Changing the YAML file would automatically update all the different formats.

Seeing it in action

You can see it in action at http://ben.balter.com/newclues

You'll notice that since we're leveraging Jekyll's native JSON and XML functionality, the text is automatically escaped to generate valid XML and JSON each time.

A few other things changed

This is just a rough first pass. A few things that I know I broke:

There are two options here. In either case, the GitHub repository becomes the canonical version of the clues. You can edit the YML file via the web interface, via text editing sites like Prose, or on your desktop with either Git command line or GitHub for Windows/Mac (Dropbox-like sync). The idea being, since the document's open source, you want to make it easy for others to contribute and improve upon it.

Option 1: Build and sync

The first option would be to simply commit any changes and download the resulting files. If pushed to a specially named branch, GitHub will automatically publish the generated site which you can access via your web browser and download. You'd then take the rendered site and FTP it to the web server as you would normally. Alternatively, you could generate the rendered site locally by running the jekyll build command, and then FTP the resulting build.

Option 2: GitHub Pages

Each repository comes with a dedicated hosting service, GitHub Pages. This service transparently hosts the generated Jekyll site and serves it via a CDN. It's what the above demo files are running on. While you can't have a Pages site as a subpath (e.g., http://cluetrain.com/newclues/) due to how DNS works, you can set up a redirect from http://cluetrain.com/newclues/ to newclues.cluetrain.com, which would be the hosted, auto generated version of the clues.

There's a lot here, and glad to go into more detail as to any of the above, but thought it'd be easier to talk more concretely with a quick, live prototype, rather than trying to explain an abstract concept over the internet.

Thoughts?

//cc @kcshearon

dweinberger commented 9 years ago

OMG. This is so awesome.

I'm at a conference for the next 3 days but will look at this more closely asap.

tfypos because mobile On Jan 14, 2015 12:19 PM, "Ben Balter" notifications@github.com wrote:

Problem

Right now, the clues are stored simultaneously in seven different places (Three times in HTML, twice in JSON, once in XML, and once in plain text). That means that if someone proposes a change, as @itraor https://github.com/itraor did in #2 https://github.com/dweinberger/newclues/pull/2, you can't easily see what was changed (the diff). Additionally, it makes it harder for casual contributors to propose changes, and increases the risk that the various representations may get out of sync. Not to mention, I have to imagine it's a lot of work to maintain. Proposed solution

This still-work-in-progress pull request is a quick proof of concept to show how one might leverage Jekyll http://jekyllrb.com to automatically compile the various representations on each push, commit, or merge. Jekyll is an open source templating engine that can be run locally to generate the static HTML site, but it is also baked directly into GitHub and fires automatically whenever a change is made. Under the hood Storing the data

Because we want to represent the clues in machine readable formats (e.g., XML, JSON), we're forced to store the data in a machine readable format. Luckily Jekyll speaks some pretty human-friendly machine-readable formats http://jekyllrb.com/docs/datafiles/ (including CSV and JSON). Here I used YAML http://www.yaml.org/. You can see the chapters, subheads, and clues stored in this .yml file https://github.com/benbalter/newclues/blob/gh-pages/_data/clues.yml. This would be the canonical copy of the clues. We could do the same with the preamble https://github.com/benbalter/newclues/blob/gh-pages/_data/preamble.yml as well. You'll notice YML is simply key/value pairs separated by a colon, and arrays (lists) demarcated by a dash at the beginning of the line. Publishing the clues

Once the data is read in by Jekyll, it's easy to publish out in any format we want using simple templates. Here's the index file template https://github.com/benbalter/newclues/blob/gh-pages/newclues.html for example. You'll see it simply loops through each clue:

    {% for subhead in chapter.subheads %}
  1. {{ subhead.title }}
      {% for clue in subhead.clues %}
    1. {{ clue | markdownify }}
    2. {% endfor %}
  2. {% endfor %}
{% endfor %} That same process could be used to generate JSON https://github.com/benbalter/newclues/blob/gh-pages/newclues.json, XML https://github.com/benbalter/newclues/blob/gh-pages/newclues.xml, OPML https://github.com/benbalter/newclues/blob/gh-pages/newclues_.opml, or even plain text https://github.com/benbalter/newclues/blob/gh-pages/clues_.txt. Note the output should be nearly identical. Again, the big win here is that we store the clues once (the YAML), and then publish it everywhere. Changing the YAML file would automatically update all the different formats. Seeing it in action You can see it in action at http://ben.balter.com/newclues - Main page http://ben.balter.com/newclues - XML http://ben.balter.com/newclues/newclues.xml - JSON http://ben.balter.com/newclues/newclues.json - Plain text http://ben.balter.com/newclues/clues_.txt You'll notice that since we're leveraging Jekyll's native JSON and XML functionality, the text is automatically escaped to generate valid XML and JSON each time. A few other things changed - This implements the numbering via CSS (rather than javascript). Was there a reason to number things client side? - The markup is more semantic (e.g., headings are headings, unnumbered lists are unumbered lists) - Pretty permalinks (e.g., /about/ versus /about.html) - Fixed the favicon icon, which was hardcoded to localhost - Line endings were converted from CR (windows) to CR+LF (universal) A few gotchas This is just a rough first pass. A few things that I know I broke: - Some of the formatting, specifically the numbering, headers, and clue links - The javascript (because some of the IDs/classes changes) This looks promising, but how'd we actually use it? There are two options here. In either case, the GitHub repository becomes the canonical version of the clues. You can edit the YML file via the web interface, via text editing sites like Prose http://prose.io, or on your desktop with either Git command line or GitHub for Windows/Mac (Dropbox-like sync). The idea being, since the document's open source, you want to make it easy for others to contribute and improve upon it. Option 1: Build and sync The first option would be to simply commit any changes and download the resulting files. If pushed to a specially named branch, GitHub will automatically publish the generated site which you can access via your web browser and download. You'd then take the rendered site and FTP it to the web server as you would normally. Alternatively, you could generate the rendered site locally by running the jekyll build command, and then FTP the resulting build. Option 2: GitHub Pages Each repository comes with a dedicated hosting service, GitHub Pages https://pages.github.com. This service transparently hosts the generated Jekyll site and serves it via a CDN. It's what the above demo files are running on. While you can't have a Pages site as a subpath (e.g., http://cluetrain.com/newclues/) due to how DNS works, you can set up a redirect from http://cluetrain.com/newclues/ to newclues.cluetrain.com, which would be the hosted, auto generated version of the clues. There's a lot here, and glad to go into more detail as to any of the above, but thought it'd be easier to talk more concretely with a quick, live prototype, rather than trying to explain an abstract concept over the internet. Thoughts? ## //cc @kcshearon https://github.com/kcshearon You can merge this Pull Request by running git pull https://github.com/benbalter/newclues gh-pages Or view, comment on, or merge it at: https://github.com/dweinberger/newclues/pull/4 Commit Summary - first pass at jekyllification - update site url - add build version to json - fix typo in json - remove gemfile.lock File Changes - _A_ .gitignore https://github.com/dweinberger/newclues/pull/4/files#diff-0 (2) - _A_ Gemfile https://github.com/dweinberger/newclues/pull/4/files#diff-1 (3) - _A_ _config.yml https://github.com/dweinberger/newclues/pull/4/files#diff-2 (14) - _A_ _data/clues.yml https://github.com/dweinberger/newclues/pull/4/files#diff-3 (174) - _A_ _data/preamble.yml https://github.com/dweinberger/newclues/pull/4/files#diff-4 (33) - _A_ _includes/endnote.md https://github.com/dweinberger/newclues/pull/4/files#diff-5 (26) - _A_ _layouts/default.html https://github.com/dweinberger/newclues/pull/4/files#diff-6 (49) - _M_ about.html https://github.com/dweinberger/newclues/pull/4/files#diff-7 (61) - _M_ clues_.html https://github.com/dweinberger/newclues/pull/4/files#diff-8 (620) - _M_ clues_.json https://github.com/dweinberger/newclues/pull/4/files#diff-9 (130) - _M_ clues_.txt https://github.com/dweinberger/newclues/pull/4/files#diff-10 (6) - _M_ css/style.css https://github.com/dweinberger/newclues/pull/4/files#diff-11 (258) - _A_ images/armadillo-doc-dw1.jpg https://github.com/dweinberger/newclues/pull/4/files#diff-12 (0) - _A_ images/armadillo-doc-dw1_about.jpg https://github.com/dweinberger/newclues/pull/4/files#diff-13 (0) - _M_ newclues.html https://github.com/dweinberger/newclues/pull/4/files#diff-14 (67) - _A_ newclues.json https://github.com/dweinberger/newclues/pull/4/files#diff-15 (10) - _A_ newclues.xml https://github.com/dweinberger/newclues/pull/4/files#diff-16 (41) - _D_ newclues_.json https://github.com/dweinberger/newclues/pull/4/files#diff-17 (274) - _M_ newclues_.opml https://github.com/dweinberger/newclues/pull/4/files#diff-18 (188) - _D_ newclues_.xml https://github.com/dweinberger/newclues/pull/4/files#diff-19 (653) - _M_ preamble_.html https://github.com/dweinberger/newclues/pull/4/files#diff-20 (43) Patch Links: - https://github.com/dweinberger/newclues/pull/4.patch - https://github.com/dweinberger/newclues/pull/4.diff — Reply to this email directly or view it on GitHub https://github.com/dweinberger/newclues/pull/4.
dottorblaster commented 9 years ago

Hi :-)

Pros

YAML is a super-mega-easyly editable format, and you don't need to mess with curly brackets and quotes to get your content saved the way you want.

Cons

There's only a caveat about this, the YAML parser tends to be so much strict about the ':' character, taking that always as a separator. This could be a problem. The ':' needs to be HTML-encoded, as I did for titles in my Jekyll weblog (I know what I'm talking about ;D).

If this doesn't scare you at all, that's a wonderful way to distribute content and a raw version of clues. Congrats for the implementation!

benbalter commented 9 years ago

@dweinberger very valid point, and the biggest pain point with YAML, I'd say. The easy work around is to either escape the special characters, or I prefer, simply wrapping the string in quotes, as I did a few times.

If the project headed this route, you could easily set up CI to provide feedback on pull requests before they're merged.

dottorblaster commented 9 years ago

@benbalter oh I didn't see the wrapping, maybe because here in Italy it's 00.43 am and I should rest a little :-P

Well, I'm a huge fan of CI too (and I love Jekyll) so kudos for this idea, very clever :+1:

dweinberger commented 9 years ago

Colons in YAML do not scare me. Hey, I escaped the internal quotes in JSON like a boss, didn't I?

BTW, the clue numbers are done dynamically because we were editing them and manual renumbering sucks. I did them dynamically in JS because I didn't know I could use CSS. D'oh.

Unfortunately, the conference I'm at is pretty demanding on the attendees' time, so I don't know when I'll get to look at what you've done with the depth it demands. I have looked at t he YAML file and the various formats Ben has derived from it, and I'm awestruck. It's magic. Except I know it took actual work, which I appreciate.

I'm sold on having the YAML be the source and master of the New Clues. What is the next thing I should do, other than thank you again?

David

MY EMAIL ADDRESS HAS CHANGED: self@evident.com will cease to work soon. My new email address is david@weinberger.org.

On Wed, Jan 14, 2015 at 6:44 PM, Alessio notifications@github.com wrote:

@benbalter https://github.com/benbalter oh I didn't see the wrapping, maybe because here in Italy it's 00.43 am and I should rest a little :-P

Well, I'm a huge fan of CI too (and I love Jekyll) so kudos for this idea, very clever [image: :+1:]

— Reply to this email directly or view it on GitHub https://github.com/dweinberger/newclues/pull/4#issuecomment-70015597.

dottorblaster commented 9 years ago

Oh thanks for the feedback, mine was only an advice :-)

I'm sold on having the YAML be the source and master of the New Clues. What is the next thing I should do, other than thank you again?

I think merging this as soon as possible!

benbalter commented 9 years ago

What is the next thing I should do, other than thank you again?

@dweinberger It looks like some of the changes to the master branch have caused a merge conflict. I will resolve the conflict first chance I get so that this branch can be merged automatically.

Beyond that, as I noted, there are some minor formatting changes, and I believe the javascript might need to be modified slightly. I can take a pass at that, or you can if you are comfortable doing so.

Last, you'll just need to resolve the question above regarding hosting/deployment.

dweinberger commented 9 years ago

Great.

As for hosting, we'd definitely want cluetrain.com/newclues/index.html to remain the public site people go to and link to. I don't care if that's a redirect. All things being equal, I'd rather not have to sync via ftp because that is a manual process that I will therefore fuck up at some point.

Beyond that, I don't know how to decide among the choices. All of them result in a stable, complete build of index.html that people can get to through the current newclues URL, with no dependency (except for updates) on the Github site, right?

Does this give you enough to advise me on which option I'd prefer? If you have ideas on what I should prefer, I am all ears (well, and about 30% nose).

David

MY EMAIL ADDRESS HAS CHANGED: self@evident.com will cease to work soon. My new email address is david@weinberger.org.

On Wed, Jan 21, 2015 at 9:17 AM, Ben Balter notifications@github.com wrote:

What is the next thing I should do, other than thank you again?

@dweinberger https://github.com/dweinberger It looks like some of the changes to the master branch have caused a merge conflict. I will resolve the conflict first chance I get so that this branch can be merged automatically.

Beyond that, as I noted, there are some minor formatting changes, and I believe the javascript might need to be modified slightly. I can take a pass at that, or you can if you are comfortable doing so.

Last, you'll just need to resolve the question above regarding hosting/deployment.

— Reply to this email directly or view it on GitHub https://github.com/dweinberger/newclues/pull/4#issuecomment-70844081.

benbalter commented 9 years ago

@dweinberger I updated the branch:

  1. It can now be automatically merged
  2. Authorship is now randomized again (and displays the authors even if the users doesn't have javascript)
  3. Rules now have anchors

Two notes:

  1. For the rule links, rather than requiring two clicks (1 hidden click to expose the URL, another to set it), a subtle link icon now appears to the left of the rule number when you over over the rule (now only requiring one click).
  2. As for hosting, here'd be my suggestion, if I were you:
    • I can walk you though creating a gh-pages branch, which will create a preview of the code at dweinberger.github.io/newclues
    • Once the site is where you want it, create a CNAME from newclues.cluetrain.com to pages.github.com
    • Add a CNAME file to this repository with the contents newclues.cluetrain.com
    • Verify newclues.cluetrain.com looks like you want it
    • Redirect cluetrain.com/newclues to newclues.cluetrain.com

It sounds like a lot because I went step by step, but once set up, any changes made to this repository would be automatically reflected (and again, you'd only have to make the change once, to have it reflected it multiple formats)

benbalter commented 9 years ago

Oh, and to note, as before, you can preview it at http://ben.balter.com/newclues/.

dweinberger commented 9 years ago

Ben,

You're too much. And have done too much.

Some questions/responses:

  1. I liked my way of exposing the clue URLs better, although yours is more elegant. And my CSS is ugly. What I like better is: (a) I want to show the url incredibly explicitly, instead of having the url only in the browser address bar; (b) I like the gesture of providing some copy-and-paste html code. Is there some non-aesthetic reason why we should go with yours instead of mine?
  2. I would like to understand how you're doing the auto-numbering and other programmatic actions so that if and when I need to make a change, I don't have come crying back to you.
  3. Don't the clues have anchors in my version? An url like http://www.cluetrain.com/newclues/#43 takes you to that clue; I'm using an id attribute as the anchor. You're wrapping the whole clue in an A with an href that's serving as the anchor?Is that preferable? (I'm just curious. I'm also curious about what the heck "::before" means. I've never seen the double cool before. I have so much to learn.)
  4. Any time you have maybe 20 mins to walk me through the setup and to answer a couple of questions, I'm ready to do it.
  5. Thank you soooo much!

David

MY EMAIL ADDRESS HAS CHANGED: self@evident.com will cease to work soon. My new email address is david@weinberger.org.

On Sat, Jan 24, 2015 at 6:08 PM, Ben Balter notifications@github.com wrote:

@dweinberger https://github.com/dweinberger I updated the branch:

  1. It can now be automatically merged
  2. Authorship is now randomized again (and displays the authors even if the users doesn't have javascript)
  3. Rules now have anchors

Two notes:

1.

For the rule links, rather than requiring two clicks (1 hidden click to expose the URL, another to set it), a subtle link icon now appears to the left of the rule number when you over over the rule (now only requiring one click). 2.

As for hosting, here'd be my suggestion, if I were you:

  • I can walk you though creating a gh-pages branch, which will create a preview of the code at dweinberger.github.io/newclues
    • Once the site is where you want it, create a CNAME from newclues.cluetrain.com to pages.github.com
    • Add a CNAME file to this repository with the contents newclues.cluetrain.com
    • Verify newclues.cluetrain.com looks like you want it
    • Redirect cluetrain.com/newclues to newclues.cluetrain.com

It sounds like a lot because I went step by step, but once set up, any changes made to this repository would be automatically reflected (and again, you'd only have to make the change once, to have it reflected it multiple formats)

— Reply to this email directly or view it on GitHub https://github.com/dweinberger/newclues/pull/4#issuecomment-71342196.

dweinberger commented 9 years ago

I'd merge the pull request, but I don't really understand what the consequences will be. So, I'll wait until we can talk, unless you tell me to just go ahead and do it already.

benbalter commented 9 years ago

Is there some non-aesthetic reason why we should go with yours instead of mine?

Nope. It was just preference. Implemented the original embed style via 72756c1. You'll notice the text is always in the dom (not injected via javascript), and simply hidden/shown on click.

I would like to understand how you're doing the auto-numbering

The primary secret sauce is going to be in https://github.com/benbalter/newclues/blob/gh-pages/newclues.html#L28-L49. The for X in Y pattern is simply a loop. Within the loop, it creates an HTML <ol> with the subheads and clues. For the top-level OL, we simply define the numbering style as letters via CSS. For the inner OL, we need to tell it to continue the last OL numbering by providing a start="" property. The last bit, is that we want to know the clue number (so that e.g., we can generate links), which is accomplished as a simple counter within the loop.

Don't the clues have anchors in my version?

It looks like the live version sets the anchors via javascript once the page loads. In this version, each list item <li> has an explicit ID in the HTML. The additional link was part of the other link format, now removed.

I'm also curious about what the heck "::before" means. I've never seen the double cool before. I have so much to learn.)

::before is what's called a CSS pseudo class. Which adds a psuedo-element as the first child of the matched object. See https://developer.mozilla.org/en-US/docs/Web/CSS/::before.

I'd merge the pull request, but I don't really understand what the consequences will be. So, I'll wait until we can talk, unless you tell me to just go ahead and do it already.... Any time you have maybe 20 mins to walk me through the setup and to answer a couple of questions, I'm ready to do it.

Merging will simply update the code on the master branch of this repository. It won't change anything visible, but would make it harder to keep the live site maintained, if e.g., there's a long lag between merging and flipping the switch (e.g., master will then be out of sync with the live site). You can merge without consequence, otherwise, I'm glad to walk you through the process, perhaps Monday, if you'd like?

dweinberger commented 9 years ago

Thanks!

Monday works for me except 10-11 and 2-3 EST.

dweinberger@skype dweinberger.com@googleHangout 617 738 8323

David

MY EMAIL ADDRESS HAS CHANGED: self@evident.com will cease to work soon. My new email address is david@weinberger.org.

On Sun, Jan 25, 2015 at 11:05 AM, Ben Balter notifications@github.com wrote:

Is there some non-aesthetic reason why we should go with yours instead of mine?

Nope. It was just preference. Implemented the original embed style via 72756c1 https://github.com/dweinberger/newclues/commit/72756c1c54daea1845a9938a0c211474b99a4820. You'll notice the text is always in the dom (not injected via javascript), and simply hidden/shown on click.

I would like to understand how you're doing the auto-numbering

The primary secret sauce is going to be in https://github.com/benbalter/newclues/blob/gh-pages/newclues.html#L28-L49. The for X in Y pattern is simply a loop. Within the loop, it creates an HTML

    with the subheads and clues. For the top-level OL, we simply define the numbering style as letters via CSS https://github.com/benbalter/newclues/blob/gh-pages/css/style.scss#L364-L366. For the inner OL, we need to tell it to continue the last OL numbering by providing a start="" https://github.com/benbalter/newclues/blob/gh-pages/newclues.html#L31 property. The last bit, is that we want to know the clue number (so that e.g., we can generate links), which is accomplished as a simple counter within the loop https://github.com/benbalter/newclues/blob/gh-pages/newclues.html#L43.

    Don't the clues have anchors in my version?

    It looks like the live version sets the anchors via javascript once the page loads. In this version, each list item

  1. has an explicit ID in the HTML. The additional link was part of the other link format, now removed.

    I'm also curious about what the heck "::before" means. I've never seen the double cool before. I have so much to learn.)

    ::before is what's called a CSS pseudo class. Which adds a psuedo-element as the first child of the matched object. See https://developer.mozilla.org/en-US/docs/Web/CSS/::before.

    I'd merge the pull request, but I don't really understand what the consequences will be. So, I'll wait until we can talk, unless you tell me to just go ahead and do it already.... Any time you have maybe 20 mins to walk me through the setup and to answer a couple of questions, I'm ready to do it.

    Merging will simply update the code on the master branch of this repository. It won't change anything visible, but would make it harder to keep the live site maintained, if e.g., there's a long lag between merging and flipping the switch (e.g., master will then be out of sync with the live site). You can merge without consequence, otherwise, I'm glad to walk you through the process, perhaps Monday, if you'd like?

    — Reply to this email directly or view it on GitHub https://github.com/dweinberger/newclues/pull/4#issuecomment-71378801.