Closed jmglov closed 2 years ago
@jmglov I think this is a good start, but this PR has a few things that I'd like to change.
bb watch
the logic now re-parses every post's markdown's metadata: for my taste this is way too much I/O on every keystroke. We could cache the metadata of every file in a cache directory and only do the I/O whenever the cached file is older than the post itself. Alright, I had a night's sleep and looked again at:
Title: All the ways to shell out in babashka
Date: 2021-11-04
Tags: clojure
And it does look pretty clean, so let's keep it.
As for the I/O:
I'm still convinced that this needs some extra work, only processing files when necessary. What do you think?
[The metadata] does look pretty clean, so let's keep it.
Sounds good. The metadata format is what's specified by MultiMarkdown and supported by markdown-clj. I totally agree that EDN would be more friendly with Clojure tooling. I think there may be a way to have our cake and eat it too by allowing user-specified metadata transformers like the one I have for Tags
. Let's think about this a bit and possibly open an issue to implement it.
As for the I/O: I'm still convinced that this needs some extra work, only processing files when necessary. What do you think?
I'm definitely open to improving this, but I'm not sure I know how to do it. The re-processing of the metadata happens on every save, not every keystroke, right? I don't know how to cache the metadata separately from the post. Can you explain a little bit more what you had in mind?
@jmglov Since this is all parsed by markdown-clj, I think an EDN header is already supported automatically?
I'm definitely open to improving this, but I'm not sure I know how to do it.
This is exactly the reason I went with posts.edn
:)
not every keystroke, right?
It depends, if I leave my emacs buffer, it saves automatically. With 100s of posts, I think the I/O is getting a bit out of hand here.
Can you explain a little bit more what you had in mind?
We can produce a separate .edn
file for every post in the cache and use that instead.
Or ... πΏ ... produce a posts.edn
inferred from the metadata πΏ . This is what I tried to do before porting my previous blog code to this library, but gave up since I didn't think it was worth it.
It depends, if I leave my emacs buffer, it saves automatically. With 100s of posts, I think the I/O is getting a bit out of hand here.
Ah, so it's re-reading all of the posts when you modify one? That's definitely what we want. I'm sure I can improve that.
Yes, it's reading all posts.
If I'm not mistaken, the fs-watcher callbacks receive information on what changed exactly that is currently not used. In combination with a per-post .edn cache file and an atom that stores the equivalent of posts.edn
in memory it should be possible to only reload the post that has changed and merge the metadata into the atom.
True!
If I'm not mistaken, the fs-watcher callbacks receive information on what changed exactly that is currently not used. In combination with a per-post .edn cache file and an atom that stores the equivalent of posts.edn in memory it should be possible to only reload the post that has changed and merge the metadata into the atom.
This is excellent! I had gotten as far as storing the post metadata in an atom, but didn't know about the fs-watcher
piece. Thanks for the info. :)
Aha! This explains why bb watch
re-renders when you start changing a post but before you even save it:
Re-rendering {:type :create, :path posts/.#figwheel-keep-om-turning.md}
Emacs's backup file! π
@borkdude I fixed the things you pointed to. Thanks again for the great idea, @mknoszlig! π
Trying now, migrating using the new -x
option, see this blog post
bb -x quickblog.api/migrate
When I touch the .md file, I see that it uses a bunch of cached versions (one for each post). And then:
Writing tags page public/tags/index.html
Writing page: public/tags/index.html
Writing tag page: public/tags/clojure.html
Writing page: public/tags/clojure.html
Writing page: public/archive.html
Reading file from cache: .work/oss-updates-may-jun-2022.md.pre-template.html
Reading file from cache: .work/babashka-cli.md.pre-template.html
Writing page: public/index.html
Writing Clojure feed public/planetclojure.xml
Writing feed public/atom.xml
I don't think we have to re-emit the tags, tags index, archive, and feed every time we edit the post, if the header hasn't changed.
I may have more feedback, but I'll try give you one thing to do at a time ;).
I don't think we have to re-emit the tags, tags index, archive, and feed every time we edit the post, if the header hasn't changed.
Good point. Will fix. π
@borkdude I fixed the things you pointed to. Thanks again for the great idea, @mknoszlig! π
glad it helped, thanks for implementing it right away! :)
@borkdude I reworked the caching and made it much simpler (I think) and reduced the I/O greatly in watch mode. The archive page is still being re-rendered in watch mode when metadata hasn't been changed for some reason, but I figured it was worth getting the PR in front of your eyes since the archive fix should be quite a small one and easy to review on its own.
@borkdude OK, fixed all the remaining bugs (said the coder naively). This is ready for a final review.
My next mission is a branch that contains regression tests. ;)
@jmglov Would you mind not force-pushing but just make incremental commits? I'll squash the branch anyway and this will prevent issues like this locally:
@jmglov
When I change a single post, I see "reading metadata" for every post, then it re-generates my blog article and then it reads the metadata of every post again.
I thought we agreed that we don't have to read the entire article into memory just go get the metadata out, we could cache this data in a separate file. Or if the article isn't newer than the .work/cache.edn
I don't think you have to read the metadata at all?
I noticed the sorting in the archive is weird:
Hope you're not getting tired yet, I appreciate the work you're doing on this PR :)
No worries, your comments are helpful in shaking the rust off my Clojure. π
Sorry about the rewriting of history. Itβs been awhile since anyone other than myself was consuming my remote branches. Iβll stop doing that.
The metadata re-reading is an oversight. I was so focused on watch mode that I didnβt notice that. The fix is quite straightforward, as is the sorting one. Will fix both issues today. π
@borkdude Everything you noted is now fixed. You may also want to look at a PR on your blog itself where I put back the blog description that got lost somewhere along the way: https://github.com/borkdude/blog/pull/32
Please answer the following questions and leave the below in as part of your PR.
[x] This PR corresponds to an issue with a clear problem statement: #13
[x] I have updated the CHANGELOG.md file with a description of the addressed issue.