Closed simon-brooke closed 10 months ago
Yeah, I think that'd be a sensible thing to do, and I prefer the second solution as well. I don't expect performance should be a big issue in practice, and avoiding editing the original file seems like it would be best. I generally prefer not mixing user edited and generated content.
Would it be acceptable to you if, as part of this solution, I move cryogen-core.compiler/parse-post-date
to cryogen-core.util
? It seems better to me that:
This requires that this function should be in a common namespace accessible to both cryogen-core.compiler
and my proposed cryogen-core.infer-meta
, and cryogen-core.util
seems appropriate.
Yeah, that makes sense to me. 👍
The reasonably reliable way of detecting the mime types of image files, needed for Open Graph meta tags, is to use Apache Tika via its clj-tika wrapper. However, this drags in an enormous and heavyweight stack of other libraries.
Similarly, the way I'm used to of detecting image sizes, again used in Open Graph meta tags, is by using Mike Anderson's mikera/imagez library, but this too is not lightweight.
We do not have to generate rich Open Graph data, but (for my own purposes) I'd like to. What are your feelings about this?
H'mmm... Pantomime seems to now be preferred over clj-tika, but it doesn't change the argument: these are heavyweight libraries to be including for what is a marginal gain. Should I do this?
Progress report: it's doing everything I want except inferring the author's real name. I have found (different) hacks for doing this on Linux, MacOS and Windows, and could write a little wrapper around all three; but given that we already have a :author
key in the standard config.edn
, this may be a bridge too far.
Thoughts?
I think pulling author info from the config would make sense.
Just to do a progress report: this now works, except for a couple of minor issues:
h1
line used to infer the title remains in the document, so the title is shown twice in the output (I can fix this without modifying compiler.clj
, but it would require a modification to all themes);**Tags:
line used to infer the tags remains in the document, but does not have the requisite links; and I can't fix that without filtering the line out in content-dom->html
-- which I can do, but only if I pass page meta-data in in params
.I'm currently adding :inferred-meta true
to the meta-data of all pages which don't contain embedded meta-data.
I would suggest that it might be worth memoising page-content
, since it is called multiple times on the same page during the compilation process and has some compute cost.
In summary: I still intend to proceed with this for my own purposes, but it's becoming less of a small, tactical fix than I had hoped. Would a pull request still be welcome? Work in progress is here.
I think it might be better to modify the compiler to allow existing themes to work, would make it easier for people to upgrade. And agree with memoising, there's no point reading the info over and over since it's used repeatedly. I think a PR would still be welcome, it looks like most changes are in a new namespace, and it's an opt in feature.
@lacarmen thoughts?
Currently, Cryogen depends on an EDN formatted map as being the first textual item in a file which is otherwise a markdown file. This gives the peculiarity that Cryogen posts are awkward in normal Markdown editors.
If this map is not present,
cryogen-core.compiler/parse-post
throws an exception at line 177, which could trivially be caught.It seems to me that it would be straightforward, at least on UN*X platforms, to write a function which could extract key metadata directly from a plain Markdown file, and automagically create the map. for example,
author
can be derived from the 'real name' field in/etc/passwd
for the currently logged in user;title
can be derived from the first top level heading (i.e. first line beginning '# ') in the file;date
can be derived by^\(dddd-dd-dd\)
and, if matched, using that; ortags
could be derived by finding the first instance in the file of a line matching^\*\*Tags:\*\*\(.*\)
and treating the remainder of that line as a comma-separated line of tags.More interestingly and more in line with what I'm working on now is that it could derive
description
from the first non-header paragraph in the file, and a map such asby examination of the first line matching `^![\([^]]*\)\](\([^)]*\)), and the file indicated by the second match in that pattern.
This then allows you to add the following to
base.html
in any theme:and the following to the
post.html
andpage.html
of any theme:and thus generate valid OpenGraph meta-tags as seen here.
I'm almost certainly going to do this for my own use anyway. Would a pull request with this as an enhancement be accepted?
It could be implemented in one of two ways:
cryogen-core.compiler/parse-post
could automagically create the map on the fly.The second solution would have the advantage that the markdown file would not be altered, and thus would render nicely in a markdown editor; but it would obviously be substantially slower.