Move metadata - Githubissues

jmglov commented 2 years ago

Please answer the following questions and leave the below in as part of your PR.

[x] This PR corresponds to an issue with a clear problem statement: #13
[x] I have updated the CHANGELOG.md file with a description of the addressed issue.

borkdude commented 2 years ago

@jmglov I think this is a good start, but this PR has a few things that I'd like to change.

When I run bb watch the logic now re-parses every post's markdown's metadata: for my taste this is way too much I/O on every keystroke. We could cache the metadata of every file in a cache directory and only do the I/O whenever the cached file is older than the post itself.
Should we maybe choose an EDN map as the metadata format? This would increase interop with some other tooling as well, if for whatever reason you wanted to build something else out of this data. Also the format chosen right now isn't familiar to me: what syntax is this? Nor YAML, nor EDN. At least I'm familiar with EDN and I don't have to guess what to do. Up for discussion, since the current syntax does look a bit nicer in the context of a blogpost. So I'm prepared that you will change my mind.

borkdude commented 2 years ago

Alright, I had a night's sleep and looked again at:

Title: All the ways to shell out in babashka
Date: 2021-11-04
Tags: clojure

And it does look pretty clean, so let's keep it.

As for the I/O:

I'm still convinced that this needs some extra work, only processing files when necessary. What do you think?

jmglov commented 2 years ago

[The metadata] does look pretty clean, so let's keep it.

Sounds good. The metadata format is what's specified by MultiMarkdown and supported by markdown-clj. I totally agree that EDN would be more friendly with Clojure tooling. I think there may be a way to have our cake and eat it too by allowing user-specified metadata transformers like the one I have for Tags. Let's think about this a bit and possibly open an issue to implement it.

As for the I/O: I'm still convinced that this needs some extra work, only processing files when necessary. What do you think?

I'm definitely open to improving this, but I'm not sure I know how to do it. The re-processing of the metadata happens on every save, not every keystroke, right? I don't know how to cache the metadata separately from the post. Can you explain a little bit more what you had in mind?

borkdude commented 2 years ago

@jmglov Since this is all parsed by markdown-clj, I think an EDN header is already supported automatically?

I'm definitely open to improving this, but I'm not sure I know how to do it.

This is exactly the reason I went with posts.edn :)

not every keystroke, right?

It depends, if I leave my emacs buffer, it saves automatically. With 100s of posts, I think the I/O is getting a bit out of hand here.

Can you explain a little bit more what you had in mind?

We can produce a separate .edn file for every post in the cache and use that instead. Or ... 👿 ... produce a posts.edn inferred from the metadata 👿 . This is what I tried to do before porting my previous blog code to this library, but gave up since I didn't think it was worth it.

jmglov commented 2 years ago

It depends, if I leave my emacs buffer, it saves automatically. With 100s of posts, I think the I/O is getting a bit out of hand here.

Ah, so it's re-reading all of the posts when you modify one? That's definitely what we want. I'm sure I can improve that.

borkdude commented 2 years ago

Yes, it's reading all posts.

mknoszlig commented 2 years ago

If I'm not mistaken, the fs-watcher callbacks receive information on what changed exactly that is currently not used. In combination with a per-post .edn cache file and an atom that stores the equivalent of posts.edn in memory it should be possible to only reload the post that has changed and merge the metadata into the atom.

borkdude commented 2 years ago

True!

jmglov commented 2 years ago

If I'm not mistaken, the fs-watcher callbacks receive information on what changed exactly that is currently not used. In combination with a per-post .edn cache file and an atom that stores the equivalent of posts.edn in memory it should be possible to only reload the post that has changed and merge the metadata into the atom.

This is excellent! I had gotten as far as storing the post metadata in an atom, but didn't know about the fs-watcher piece. Thanks for the info. :)

jmglov commented 2 years ago

Aha! This explains why bb watch re-renders when you start changing a post but before you even save it:

Re-rendering {:type :create, :path posts/.#figwheel-keep-om-turning.md}

Emacs's backup file! 😅

jmglov commented 2 years ago

@borkdude I fixed the things you pointed to. Thanks again for the great idea, @mknoszlig! 🙂

borkdude commented 2 years ago

Trying now, migrating using the new -x option, see this blog post

bb -x quickblog.api/migrate

borkdude commented 2 years ago

When I touch the .md file, I see that it uses a bunch of cached versions (one for each post). And then:

Writing tags page public/tags/index.html
Writing page: public/tags/index.html
Writing tag page: public/tags/clojure.html
Writing page: public/tags/clojure.html
Writing page: public/archive.html
Reading file from cache: .work/oss-updates-may-jun-2022.md.pre-template.html
Reading file from cache: .work/babashka-cli.md.pre-template.html
Writing page: public/index.html
Writing Clojure feed public/planetclojure.xml
Writing feed public/atom.xml

I don't think we have to re-emit the tags, tags index, archive, and feed every time we edit the post, if the header hasn't changed.

I may have more feedback, but I'll try give you one thing to do at a time ;).

jmglov commented 2 years ago

I don't think we have to re-emit the tags, tags index, archive, and feed every time we edit the post, if the header hasn't changed.

Good point. Will fix. 🙂

mknoszlig commented 2 years ago

@borkdude I fixed the things you pointed to. Thanks again for the great idea, @mknoszlig! 🙂

glad it helped, thanks for implementing it right away! :)

jmglov commented 2 years ago

@borkdude I reworked the caching and made it much simpler (I think) and reduced the I/O greatly in watch mode. The archive page is still being re-rendered in watch mode when metadata hasn't been changed for some reason, but I figured it was worth getting the PR in front of your eyes since the archive fix should be quite a small one and easy to review on its own.

jmglov commented 2 years ago

@borkdude OK, fixed all the remaining bugs (said the coder naively). This is ready for a final review.

My next mission is a branch that contains regression tests. ;)

borkdude commented 2 years ago

@jmglov Would you mind not force-pushing but just make incremental commits? I'll squash the branch anyway and this will prevent issues like this locally:

borkdude commented 2 years ago

@jmglov

When I change a single post, I see "reading metadata" for every post, then it re-generates my blog article and then it reads the metadata of every post again.

I thought we agreed that we don't have to read the entire article into memory just go get the metadata out, we could cache this data in a separate file. Or if the article isn't newer than the .work/cache.edn I don't think you have to read the metadata at all?

borkdude commented 2 years ago

I noticed the sorting in the archive is weird:

borkdude commented 2 years ago

Hope you're not getting tired yet, I appreciate the work you're doing on this PR :)

jmglov commented 2 years ago

No worries, your comments are helpful in shaking the rust off my Clojure. 😉

Sorry about the rewriting of history. It’s been awhile since anyone other than myself was consuming my remote branches. I’ll stop doing that.

The metadata re-reading is an oversight. I was so focused on watch mode that I didn’t notice that. The fix is quite straightforward, as is the sorting one. Will fix both issues today. 🙂

jmglov commented 2 years ago

@borkdude Everything you noted is now fixed. You may also want to look at a PR on your blog itself where I put back the blog description that got lost somewhere along the way: https://github.com/borkdude/blog/pull/32

borkdude / quickblog

Move metadata #15