Closed beechnut closed 2 months ago
This is a fun one.
The biggest issue here is not the plugin code itself, which I found easy to understand. The issue here is architectural: figuring out where this code should reside.
I tried to split up the calculations (related_scores
, which needs to traverse all posts and all users separately) and storing it somewhere as a global object, so that the shortcode becomes vastly simpler -- essentially, the shortcode will pull on a cached, pre-calculated related_scores
data structure to do its work. But now I'm reconsidering this approach, maybe it is simpler after all to do a 11ty plugin straight up, as that keeps all the related code in one place (you would need to pass in the users and posts to the plugin in .eleventy.js
). Either way, the code is fairly easy to translate to javascript; it's just a matter of figuring out the approach we want to take here.
I did not have time to finish this work so I will unassign myself.
To capture our notes from yesterday's call:
_includes/layouts/post.html
L47 should read something like:
{% assign related_posts = page | findRelatedPosts %}
Then, findRelatedPosts
should be a function that takes the page
variable (which contains data from the current post), and using postsCollection
aka collections.posts
, finds the 3 related posts to be shown in the footer.
We don't need to exactly re-implement the Jekyll code — prefer simplicity to replicating that behavior.
My only input is that if posts share both authors and some tags, that weight should be substantially (more than 2x) greater than just sharing an author or just sharing some tags.
Once the basic algorithm is done, we want to consider caching the related posts. The only time we really need to recalculate related posts is when a post is added/removed, or when post tags have changed — recalculating related posts on every build will just be time added to the build.
A shortcut might be to re-cache any time anything in the posts collection (content/posts/*.md
) is added/removed/modified. We can use the differ classes in lib/
to list changed files, and then we just detect
(in Ruby parlance) against the posts collection pattern.
The cached data can be stored in .cache/
as related-posts-{timestamp? hash?}.(json|csv)
, and we can add the cached related posts as a collection in config/collections.js
.
Some more thoughts on caching:
.cache/related-posts-{timestamp}.json
{
"/2022/07/20/senior-executives-pt1/": [
{
"url": "/2022/08/25/senior-executives-pt5/",
"title": "Senior executives part 5: Use stories as leading indicators",
"excerpt": "Executives often rely on productivity metrics to measure success, but these measures can obscure whether the software is actually working for users. Stories are a better resource to build a strategy between a senior executive and a product team. This is part five in a series on how senior executive and tech teams can be better allies."
},
{ "url": , "title": , "excerpt": },
{ "url": , "title": , "excerpt": },
],
}
The cache is just the data needed for presentation: title, url, and excerpt. The key for each is a post url, the value being the three related posts' essential linking data.
Design goals: read the JSON file once, keep in memory during build — probably just in .eleventy.js to start.
const latestCacheFile = TODO read the file
const cache = JSON.parse(fs.open(latestCacheFile))
relatedPosts = (page) => { cache[page.url] }
Usage:
{% comment %}
Obviously in the site the HTML is different but
{% endcomment %}
{% assign related_posts = page | relatedPosts %}
{% for post in related_posts %}
<a href="{{ post.url | url }}">{{ post.title }}</a>
<p>{{ post.excerpt }}</p>
{% endfor %}
We have a branch where things are generally working. To wrap this up, we need to:
npm run precommit
I think we're another day or so from completion, but we also just got staffed on projects, so, TBD.
The 18F website recommends "related posts" based on posts with similar tags and by the same author.
The Jekyll site used a generator to add "related posts" to a page based on categories/tags/authors — basically, a post was more related if it shared tags, authors, and categories.
It's not a requirement that the generator code is translated exactly from the Jekyll site, but it should serve as a good starting point for an algorithm. I don't recommend trying to understand it as-written — rather, refactor it to understand it.
The scope of work for this ticket is to write a good-enough shortcode or filter that takes a post and optional limit, and returns the most related posts by tag and author (we don't use categories). If several posts have equal or equivalent relatedness scores, sort by newest-first.