adamghill / coltrane

A minimal app framework for content sites.
https://coltrane.readthedocs.io/en/stable/
MIT License
91 stars 4 forks source link

Support tags, keywords #29

Open adamghill opened 2 years ago

adamghill commented 2 years ago

Look through https://gohugo.io/content-management/front-matter/ and see what makes sense (and is easy) to support.

Some of these might not need explicit support, but could just be added to documentation.

After skimming the list:

Tobi-De commented 2 years ago

I want to work on this, especially the tags and publishDate options. In the code you already parse the date keyword. I assume that publishDate will work the same way.

Do you think it would be possible to have a lastUpdatedDate that would automatically update itself when changes are made to the markdown file? It seems you already have something like that for the static mode. I'm building a personal blog (the most common basic dev project 😆 ever) and I keep changing my mind about how it will work lol. At first I was using a Post model with most of the data: title, tags, etc... attached to it with coltrane just used to render the markdown content, with that setup it was easier to have a last_updated field but then I decided that if I was going to use coltrane I should just go all the way with it and have all the data related to a specific post in one place, the markdown file. I'm not sure that's the best approach but at least it gives me the opportunity to contribute to coltrane itself and it's a personal project, nobody cares. But now I need to figure out a way to know when a file was last modified 🤔 , I could just update the lastUpdated date myself, but this will be my last resort

For tags, I was thinking of adding some basic utilities (maybe template tags) for it:

Tobi-De commented 2 years ago

I tried and you can already have something like this in the frontmatter

---
tags:
 - javascript
 - django
 - python
---

and it will parse it as a python list, there is not much left to do. I didn't know frontmatter before I started using coltrane, to be sure frontmatter parsing is handled by the python-markdown2 metadata extra right?

adamghill commented 2 years ago

Well, I'll take all the help I can get! 😄

Yep, python-markdown2 handles the frontmatter already, but I think it would be useful to provide documentation about standard defaults and then some ways to get a list of tags -- your utilities/templatetags ideas make sense to me. I do wonder if there is an efficient way to collect all the tags so it doesn't take forever with a lot of markdown files? Maybe something with the manifest file I use when re-generating static files? Although, that doesn't get used in integrated mode currently I don't think. We could just use the Django cache system perhaps?

We could handle publish_date the same as date. Although now that I'm thinking about it... I can't think of a time where I'd want both a date and a publish_date, so maybe it should just be publish_date since it's more explicit?

Do you think using the markdown file's mtime would be sufficient as a proxy for last_updated? I'm unclear if that would work once you deploy a file code to production, i.e. would the markdown file modified datetime be from the original system or when it was modified on the prod system?

Tobi-De commented 2 years ago

We could handle publish_date the same as date. Although now that I'm thinking about it... I can't think of a time where I'd want both a date and a publish_date, so maybe it should just be publish_date since it's more explicit?

You are right, in my case for example I just use publish_date it makes more sense, so we could just look for publish_date in the frontmatter. A new file in the common folder of the docs would be useful to document the standard keyword that coltrane supports.

Yep, python-markdown2 handles the frontmatter already, but I think it would be useful to provide documentation about standard defaults and then some ways to get a list of tags -- your utilities/templatetags ideas make sense to me. I do wonder if there is an efficient way to collect all the tags so it doesn't take forever with a lot of markdown files? Maybe something with the manifest file I use when re-generating static files? Although, that doesn't get used in integrated mode currently I don't think. We could just use the Django cache system perhaps?

From what I've seen, manifest files are not used in any mode other than static but yes, I think using them to store tags might be a good idea. If we choose this option, there should probably be a management command for users in integrated mode (I'm not sure if this applies to standalone mode) to generate these manifest files and documentation to explain that this is recommended to improve the performance of the provided tags utilities.

small aside, using the manifest files just gave me the idea to use a similar file to store the indexes for the search feature with lunr.py

The django caching system could be a great option since users could later add something like redis to get better performance, but I think starting with the manifest and implementing something cache-based later as the second option is better. My reasoning for this is that the in-memory cache that django uses by default is less reliable than disk files (manifest) in most cases, I think. Any other caching backend would require additional infrastructure, a database, redis, etc...

Do you think using the markdown file's mtime would be sufficient as a proxy for last_updated? I'm unclear if that would work once you deploy a file code to production, i.e. would the markdown file modified datetime be from the original system or when it was modified on the prod system?

🤔 mtime may not be the best option to get a last_updated date in most cases, but in some cases it may be close enough, we should at least give the user the option to use it if they want. Perhaps a template tag? or injecting the ManifestItem into the context? Injecting the ManifestItem seems easier to implement, if the manifest is not present, then the value will simply not be in the context. This is another reason to add a generate-maniftest-files management command for the integrated mode.

Tobi-De commented 2 years ago

The django caching system could be a great option since users could later add something like redis to get better performance, but I think starting with the manifest and implementing something cache-based later as the second option is better. My reasoning for this is that the in-memory cache that django uses by default is less reliable than disk files (manifest) in most cases, I think. Any other caching system would require additional infrastructure, a database, redis, etc...

On second thought, something based on Django's caching system would be easier to implement and there is a file-based backend that static sites could use instead of the default locmem if my concerns about it are valid.

adamghill commented 2 years ago

I originally thought about the filesystem cache instead of manifest.json, but I liked the idea of something that was easily readable. But, I ended up writing a lot of code to create/load/parse that file which was pretty annoying. It might not be worth the hassle.

A new file in the common folder of the docs would be useful to document the standard keyword that coltrane supports

I have this, but 1) it doesn't even have date and 2) it might be more useful split out into its own document to make it more clear anyway.

Tobi-De commented 2 years ago

I originally thought about the filesystem cache instead of manifest.json, but I liked the idea of something that was easily readable. But, I ended up writing a lot of code to create/load/parse that file which was pretty annoying. It might not be worth the hassle.

I'll try to implement something for tags based on the cache system and see how it goes. Maybe we could migrate the current manifest system to cache later to simplify the code.

I have this, but 1) it doesn't even have date and 2) it might be more useful split out into its own document to make it more clear anyway.

I'm going to make a PR to rename date to publish_date and add a new section to common docs, what should it be called? TemplateContext or maybe just Context?

adamghill commented 2 years ago

Context works for me, thanks! Do you want a separate issue for the publish_date stuff?

Tobi-De commented 2 years ago

Context works for me, thanks! Do you want a separate issue for the publish_date stuff?

Nope, it is not necessary.

Tobi-De commented 2 years ago

Here is a draft implementation of what the tag utilities might look like

from coltrane.config.cache import Cache
from coltrane.retriever import get_content_items, ContentItem

@dataclass
class DataCache(Cache):
    def __init__(self):
        super().__init__("CONTENT_ITEMS_CACHE")

def get_content_items_with_tags():
    # cache here
    return [
        item
        for item in get_content_items(skip_draft=False)
        if item.metadata.get("tags") and not str(item.path).endswith("index.md")
    ]

def all_unique_tags() -> set[str]:
    tags = chain(*[item.metadata.get("tags") for item in get_content_items_with_tags()])
    return {tag.strip().lower() for tag in tags}

# add an exclude parameter
# lru_cache maybe
def filter_by_tags(
        tags: list[str], include_all_tags: bool = False
) -> Iterable[ContentItem]:
    checker = all if include_all_tags else any
    return [
        item
        for item in get_content_items_with_tags()
        if checker(tag in item.metadata.get("tags") for tag in tags)
    ]

This is obviously far from complete, just to give you an idea, I will turn them into template tags. I'll stop bothering you for now @adamghill 😄 Have a nice weekend

adamghill commented 2 years ago

Nice! Looking forward to seeing the end result. 👍 Do you think it would be useful to add a tags field to ContentItem? That might encapsulate the couple of item.metadata.get("tags") in your code above.

One other thing that might be useful for all_unique_tags (or another templatetag, maybe?) is the count of content that has a particular tag. Not sure if that would return a tuple or another dataclass or something else.

Tobi-De commented 2 years ago

Do you think it would be useful to add a tags field to ContentItem? That might encapsulate the couple of item.metadata.get("tags") in your code above.

Yes, I was thinking of adding a tags property to ContentItem and later title and description for text based search.

One other thing that might be useful for all_unique_tags (or another templatetag, maybe?) is the count of content that has a particular tag. Not sure if that would return a tuple or another dataclass or something else.

I thought about it and getting the count could be accomplished with the django length tag combined with filter_by_tags. This would be explained in the docs of course but I see no reason for a dedicated template tag at the moment.

Tobi-De commented 2 years ago
def get_content_items_with_tags():
    # cache here
    return [
        item
        for item in get_content_items(skip_draft=False)
        if item.metadata.get("tags") and not str(item.path).endswith("index.md")
    ]

I was also thinking about something, would it be better to only cache the items with tags and leave the get_content_items_with_tags as is or maybe we should just cache all ContentItem when get_content_items is called. This way it could be useful for other use cases, for example for the search feature or custom template tags that relies on get_content_items.

adamghill commented 2 years ago

maybe we should just cache all ContentItem when get_content_items is called

I think this makes sense if we can bust the cache intelligently.

jimmybutton commented 1 year ago

Not sure if this is the right thread for my question. Does Coltrane support generating a separate page for each used category/tag (e.g. with categories or tags defined in the frontmatter) in 'static site' mode? What would be required to acchieve this? Many thanks!

Tobi-De commented 1 year ago

Not sure if this is the right thread for my question. Does Coltrane support generating a separate page for each used category/tag (e.g. with categories or tags defined in the frontmatter) in 'static site' mode? What would be required to acchieve this? Many thanks!

Hi @jimmybutton , as far as I know, no, Coltrane does not support this at the moment. There are only two ways I can see this working:

Create a category page for each category / tag you have in advance and then use the directory_contents template tag to filter your content. When your site is built, it will include a page for each category with links to related content. This sounds really tedious though, maybe making a command for this is a feature worth considering @adamghill

Write some js to do the filtering in real time and ship that js code with your site. Lunr.js could perhaps help.

I've been busy lately but I had already planned to build a tag filtering and search feature for Coltrane, but what I'm thinking about will only work in integrated and standalone mode. I don't see an easy way to make it work in static mode right now.

I'm sorry I could not be of more help.

jimmybutton commented 1 year ago

@Tobi-De Thanks for your reply and great ideas 👍! I think I'll have a go at the first option you described and see if I can get it working.