eyeseast / python-frontmatter

Parse and manage posts with YAML (or other) frontmatter
http://python-frontmatter.rtfd.io
MIT License
330 stars 42 forks source link

Allow loading and parsing multiple posts #22

Open gaconnet opened 8 years ago

gaconnet commented 8 years ago

Hi. It's nice to see a frontmatter library written in python. Thanks for writing it!

How do you feel about supporting a way to load & parse multiple posts in one go, or perhaps even in a streaming fashion via an iterator or coroutine?

As motivation, consider a single markdown file that you would like to transform into a sequence of <section> tags to insert into reveal.js, and you want your transformation pipeline to transform metadata attributes into html data attributes for things like custom slide transitions:

# python-frontmatter
an introduction

---
transition: zoom

---
load and parse files (or just text) with YAML front matter.

---
transition: concave
background: linear-gradient(45deg, #f06, yellow)

---
now with streams of documents!

I built a little standalone parser for this on my own, but I thought it might be nice if this cool library did it.

eyeseast commented 8 years ago

I think the way I'd do this is with multiple files, or by splitting text and parsing strings with frontmatter.loads. We do this with a lot for @frontlinepbs projects, usually with metalsmyth (which needs a better name) and Tarbell.

gaconnet commented 8 years ago

I agree that splitting it before it comes into frontmatter would be a fine way to go, but it seems unfortunate that such a splitter would need to duplicate some of the parsing work of frontmatter and would also need to be configured separately if either frontmatter or the splitter were to ever support custom delimiters (such as in gray-matter).

I think that having a simple interface to parse a stream of posts opens many interesting opportunities. For example, a non-programmer uses prose to edit a single file that goes into GitHub or a Gist and then a post-commit hook transforms the single file into a multi-page slideshow. In addition to parsing a single file as a stream, a streaming parser enables diverse command-line invocations such as parse-frontmatter prologue.md - epilogue.md and collect-interesting-files | parse-frontmatter (both examples fictional; assume that the fictional binaries both do something interesting).

I'm happy to have the splitter be a separate tool though. I just wanted to point out these opportunities here. I'll also give metalsmyth a try.

If you're still not sold on the idea then feel fee to close this issue whenever you feel the time is right. :)

eyeseast commented 8 years ago

Just ran into a situation that matches exactly this approach, so I'm going to reopen and reconsider.

brainstorm commented 6 years ago

I'm in a similar situation (not online though) where I want to migrate from jekyll to blogdown and I want to change a couple of metadata attributes for all posts:

#!/usr/bin/env python

import os
from pathlib import Path
import datetime
import frontmatter

posts_root = os.environ['HOME'] / Path('dev/brainblog/content/post')

for post in posts_root.iterdir():
    fname_date = post.name[0:10] # capture the "2018-02-08" "timestamp" from the post filename
    tstamp = datetime.datetime.strptime(fname_date, "%Y-%m-%d").timestamp()
    utc_time = datetime.datetime.utcfromtimestamp(tstamp)
    utc_string = utc_time.strftime("%Y-%m-%dT%H:%M:%S.%f+00:00 (UTC)")
    with post.open() as f:
        post = frontmatter.load(f)
        if post.get('date') is not None:
            post.__setitem__('date', utc_string)
            post.__setitem__('modified', utc_string)
            frontmatter.dump(post, f)
            #print(post.metadata)

But apparently I cannot frontmatter.dump against the same post/filehandle:

Traceback (most recent call last):
  File "/Users/romanvg/bin/markdown_datetime.py", line 20, in <module>
    frontmatter.dump(post, f)
  File "/Users/romanvg/.miniconda/lib/python3.5/site-packages/frontmatter/__init__.py", line 155, in dump
    fd.write(content.encode(encoding))
TypeError: write() argument must be str, not bytes

How would you change such metadata attributes and serialize them "in-place"?

brainstorm commented 6 years ago

Nevermind, just opened/closed the object with different modes. Thanks for your lib! ;)

    with post.open('r') as f:
        post_fm = frontmatter.load(f)
        if not post_fm.get('date'):
            post_fm.__setitem__('date', utc_string)
            post_fm.__setitem__('modified', utc_string)
            post_str = frontmatter.dumps(post_fm)
            f.close()

            with post.open('w') as f:
                f.write(post_str)
lekhnath commented 9 months ago

Just ran into a situation that matches exactly this approach, so I'm going to reopen and reconsider.

It's been around 8 years since you've commented. I also ran into a situation where I need to support this. Any help will be appreciated.