eyeseast / python-frontmatter

Parse and manage posts with YAML (or other) frontmatter
http://python-frontmatter.rtfd.io
MIT License
329 stars 43 forks source link

Multiline strings in metadata #82

Closed JonEllis closed 3 years ago

JonEllis commented 3 years ago

I am attempting to add a multiline string to a document, but I'm getting syntax that breaks my pandoc processing. Is there a way to save the multi line strings using yaml block syntax?


Test case

import frontmatter

doc = frontmatter.loads(
    '\n'.join([
        '---',
        'title: multiline test',
        '---',
        '# multiline test',
    ])
)

doc['multiline'] = '\n'.join([
    'this is a',
    'multiline',
    'string',
])

print(frontmatter.dumps(doc))

Expected (desired) output

---
multiline: |
  this is a
  multiline
  string
title: multiline test
---

# multiline test

Actual output

---
multiline: 'this is a

  multiline

  string'
title: multiline test
---

# multiline test
eyeseast commented 3 years ago

That's weird. Definitely looks like a PyYAML issue. What happens if you try to do pyyaml.dump(metadata)?

JonEllis commented 3 years ago

You're correct, I get this:

{multiline: 'this is a

    multiline

    string', title: multiline test}

I can "fix" this for pyyaml:

import frontmatter
import yaml

class literal_str(str): pass

def literal_str_representer(dumper, data):
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

yaml.add_representer(literal_str, literal_str_representer)

doc = frontmatter.loads(
    '\n'.join([
        '---',
        'title: multiline test',
        '---',
        '# multiline test',
    ])
)

doc['multiline'] = literal_str('\n'.join([
    'this is a',
    'multiline',
    'string',
]))

print(yaml.dump(doc.metadata))

Which gives me

multiline: |-
  this is a
  multiline
  string
title: multiline test

However, if I try to involve frontmatter, it trips over with "cannot represent an object".

I'm just throwing something together at the moment, so I'll try to return to this another day.

eyeseast commented 3 years ago

Any **kwargs you pass when you run frontmatter.dumps will get passed through to yaml.dump, so you might try that. Here's the relevant code: https://github.com/eyeseast/python-frontmatter/blob/master/frontmatter/default_handlers.py#L240-L249

It uses PyYaml's SafeDumper class by default, which might be the issue. But if you can make it work with yaml.dump, you should be able to get it working with frontmatter.dumps.

JonEllis commented 3 years ago

Yes, it is indeed the SafeDumper. Excellent suggestion. Makes sense as I'm now wrapping a string in an object. It seems that if I add the new representer to the SafeDumper itself, then I can continue to dump this document safely.

So this is the working version:

import frontmatter
import yaml

class literal_str(str): pass

def literal_str_representer(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')

yaml.SafeDumper.add_representer(literal_str, literal_str_representer)

doc = frontmatter.loads(
    '\n'.join([
        '---',
        'title: multiline test',
        '---',
        '# multiline test',
    ])
)

doc['multiline'] = literal_str('\n'.join([
    'this is a',
    'multiline',
    'string',
]))

print(frontmatter.dumps(doc))

Which outputs:

---
multiline: |-
  this is a
  multiline
  string
title: multiline test
---

# multiline test
eyeseast commented 3 years ago

Great. Happy that worked. I'm going to close this, but feel free to add more comments if needed.