The Great Content Re-alignment

jaredcwhite commented 3 years ago

January 2021 Update: work has begun on this in earnest. Using the term "resource" and not "content". (Thanks @andrewmcodes!) Development of Bridgetown::Resource::Base and supporting classes is now underway! Check out the "diary"…

Looking ahead to what I hope to accomplish for an official Bridgetown 1.0 release and beyond, I think we need to take a hard and painful look at the distinctions between Page & Document and also between the posts collection and other collections.

➡️ Bridgetown's heritage comes from Jekyll and Jekyll comes from the idea that you have blog posts and you have standalone pages (home page, about page, etc.), and anything else is a "static file". Then the concept of collections emerged, with posts being a kind of builtin collection, but posts behave differently in some respects compared to other collections and are read off the filesystem completely differently.

➡️ There's also confusion around how permalinks work and how to configure them, because the top-level permalink config value affects both pages and blog pots, but there's also a permalink config possible at the collection level, plus you can add permalink configs via front-matter defaults which could affect anything potentially, so it's clear as mud.

➡️ At the code level, I've done what I can using mixins/concerns to get the Page and Document classes to act more alike and work similarly in various respects using duck typing, but it can still be frustrating. Layout is yet another similar-but-not-really sort of enigma.

➡️ There's also the question of when to use data files and when to use collections, and in fact you can add YAML files in a collection folder and they are treated as documents with front matter and a blank content field! 🤪 Also wacky, until a recent bug fix, static files saved within collection folders were processed as "collection documents" even though they were the StaticFile class and were missing from the site's overall static files array.

➡️ Another problem is currently categories and tags are post-specific. If you add categories and/or tags to other collections, or pages for that matter, they're invisible from any typical searching/filtering of categories/tags.

➡️ Yet another obscure problem is you currently don't have any control over the order in which content is processed on a file-by-file basis, so you can occasionally run into issues where File A is trying to display content from File B, C, D, etc. but the content for those files haven't actually been processed, so File A shows the raw markup/template string instead of the processed content. Oops! It's a non-trivial problem, because you could potentially run into circular dependencies. File A displays content from File B, but File B wants to display content from File A. Yikes. That happens virtually never in a typical site design, but you never know.

➡️ But wait, there's more! Right now there's no concrete way to determine the "source" of a particular file/piece of content if it came from an API/headless CMS—you only know if it came from an actual file on the filesystem, otherwise it's just "virtual". In addition, after it gets rendered at a particular URL, you can't backtrack—in other words, you can't determine that /a/b/c corresponds to this one object and, say, re-render that particular object.

❓ (There's also the outstanding question of how all this relates to ActiveModel objects that can be used to load/validate/save content in a Rails CMS-context—a project I have underway—but I think I'll save that for a future issue.)

❤️ All that to say…I'll always love Jekyll to pieces, but its content modeling situation is kind of screwball and it's time for us to fix this in Bridgetown once and for all so we have a sane platform to build on for the next ten years.

So, how do we fix this? 😂

I propose creating a new namespace under Bridgetown called Bridgetown::Content. Inside we'd define several classes:

Bridgetown::Content::Base — this represents a single piece of content. This is any kind of content that isn't simply a "static file" like an image or PDF. So that means page, blog post, collection document, YAML/JSON/CSV/etc. data file, whatever.
Bridgetown::Content::Source — this is attached to the content object and represents where the content came from…filesystem, third-party API, generator, etc.
Bridgetown::Content::Destination — this is attached to the content object and represents the URL/filepath where the content will be generated.
Bridgetown::Content::Transformer — this is an auxiliary object that is responsible for transforming the object data from raw input to final converted output
Bridgetown::Content::Dependencies — this would determine the dependencies required for each piece of content and use that to facilitate both the correct order of processing and also to cache in the future so a piece of content could be quickly rerendered along with just its dependencies. There'd be some default heuristics along these lines but you could manually specify dependencies on a per-object basis. (Like a product template could specifically require "products" to be a dependency and maybe just the products in its own category.)
Bridgetown::Content::Taxonomy — this would represent a particular way to classify a content item. A category would be a Taxonomy of type "category", a tag would be a Taxonomy of type "tag", etc. Site owners could easily configure any sort of Taxonomy. Looking at Hugo for example, it comes out of the box configured like so:
```
taxonomies:
 category: categories
 tag: tags
```
but you could adjust that however you like.
Bridgetown::Content::Relations — this is how a content object could be thought of as "related" to another type of object…parent-child relationships, belongs-to/has-many, etc. So you could have author: janedoe in a post's frontmatter and then maybe post.relations.author would automatically resolve to the content object for janedoe. The relations themselves would probably be defined in the yml where collections are currently configured.

After doing all this, we'd refactor Bridgetown::Page and Bridgetown::Document so they're just child subclasses of Bridgetown::Content::Base, and we'd probably add Bridgetown::StructuredData as well to represent a YAML/JSON/etc. data structure. In addition, we get rid of separate file readers for pages, collections, and posts, and unify everything into a single file reader. I also like the idea of letting front matter itself override directory locations, so you could potentially have everything all in a top-level folder and just add collection: posts, collection: recipes, etc. That would be dumb, but it would also be immensely flexible and eliminate any hard requirements for folders like _posts, _recipes, etc.

The special behavior of posts would be basic configuration options of a collection, so any collection could potentially behave in that manner if configured. Pages would just be collection-less documents, essentially—or alternatively, create a pages or default or unfiled collection and use that.

I'd also like to make sure we get good-quality content graphs out of all this so menus, breadcrumbs, etc. would be a piece of cake once the collections/taxonomies/relations are properly configured. (Again, Hugo leads the way on this stuff!)

In terms of ecosystem impact, my hope is that after doing all this, most existing sites would work "as is" from the user's perspective, and any external Bridgetown plugins would only need slight tweaks to work with the new Page/Document classes…not entirely backwards-compatible unfortunately, but since we're still pre-1.0, the time for breaking changes is really now if ever. Once we do this, we break free from Jekyll's gravitational pull and get to define the future of Bridgetown on our terms. Very exciting!

Please note all the above class names are purely theoretical at this point and subject to deliberation and further brainstorming, so please let me know what you think and if I'm missing any important aspects of quality content modeling. We shouldn't shy away from looking at how other CMSes and site generators do this stuff and aim for providing as much power and flexibility as we can right out-of-the-box.

jaredcwhite commented 3 years ago

(Not specifically stated, but I think we should consider making the source/reader object for a collection really smart, i.e., it could actually pull data using Ruby libraries, caches, whatever…even going so far as to connect to ActiveRecord so you could load content right out of a DB using Rails. I know that sounds kind of nuts, but I'm thinking big here!)

jaredcwhite commented 3 years ago

One other thing we need to be mindful of…we have unique performance requirements compared to dynamic request/response frameworks because a teeny, tiny change in memory/CPU load could make the difference between a few seconds and a few minutes for a really large site build. So I'd hate to redo everything here and then find out Bridgetown is suddenly way behind Jekyll/Eleventy/etc. Probably the best way to think about it is to identify the "happy path" — aka a typical site configuration — and streamline that as much as possible. If we have a good benchmark suite ahead of time, we can do A-B between the current system and the new one to identify pain points.

jaredcwhite commented 3 years ago

Last comment for now (I swear!) — this is also a good opportunity to rely more heavily on ActiveSupport and potentially other gems so we get the benefit of their hard work and optimizations and don't have to write so much from scratch ourselves.

andrewmcodes commented 3 years ago

@jaredcwhite This all sounds super great and I hate to bring out the ol 🖌 (for some bike shedding - emoji options for paint aren't great) but atm I really dislike content.

I need to noodle on alternatives and I may throw out a few (some will be awful) suggestions. Totally fine if that's the best we have but it feels wrong at this moment.

Alternatives (edited) naming is hard - please wr'k the bad ones 😛

resources: when speaking about the umbrella of items you reference above, it's not necessarily "content". Document was a great description, but these are also all resources that have their own entry points, history, relationships, etc.
Components controversial: I am just throwing it out but I know the arguments against it. EOD these are all little components that are stitched together. If you look into Next.js' source for ssr, they refer to them as components. But this conflicts with other components.
Source
Chunks
tbd

jaredcwhite commented 3 years ago

@andrewmcodes Yeah I'll admit I'm not crazy about the over-utilization of the term content either…resource perhaps captures more nuance around what we're actually talking about. Probably would shy away from the other options. Still TBD

jaredcwhite commented 3 years ago

FYI: https://github.com/bridgetownrb/bridgetown/issues/194#issuecomment-767031773

jaredcwhite commented 3 years ago

FYI: I'm keeping a diary of sorts in the #upcoming channel of the Bridgetown Discord as I work on this. Follow along there if you dare! 😉

jaredcwhite commented 3 years ago

Interesting… 🤔 https://craftcms.com/features/all#section-types

jaredcwhite commented 3 years ago

In terms of ecosystem impact, my hope is that after doing all this, most existing sites would work "as is" from the user's perspective, and any external Bridgetown plugins would only need slight tweaks to work with the new Page/Document classes

Well, that ended up not being the case. I've made a lot of breaking changes — not to existing sites using the legacy engine (any major breakage there would be considered a bug), but when switching to the new resource engine a lot of Liquid/ERB syntax will need to change and plugins will need to be updated. It's painful, but this is the only time we can get away with it. A year or two from now and such a shift would be extremely upsetting. I'm not fond of moving farther away from Jekyll compatibility, but on the other hand we're not really competing with Jekyll. We're competing with Gatsby. We're competing with Eleventy. We're competing with Hugo. We need to be fabulously good in order to be a viable contender. Just being slightly better than Jekyll, and using Ruby, isn't enough. Anyway, I'll be writing all this up more succinctly in a blog post shortly!

jaredcwhite commented 3 years ago

There's more to do after the release of Bridgetown 0.20 but I'll file them as separate issues. Closing! :tada:

bridgetownrb / bridgetown

The Great Content Re-alignment #187