alfredbaudisch / pardall_markdown

Reactive publishing framework, filesystem-based with support for Markdown, nested hierarchies, and instant content rebuilding. Written in Elixir.
Apache License 2.0
115 stars 7 forks source link

Add a YAML FrontMatter metadata parser #57

Open rockchalkwushock opened 2 years ago

rockchalkwushock commented 2 years ago

What does this PR do?

Closes #41

rockchalkwushock commented 2 years ago

@alfredbaudisch so far things are going well I am running into some problems I don't fully understand:

  1. I assume I should not have to prefix the contents arg in do_parse.("\n---\n#{contents}"). I get a nastygram from ElixirMap when it tries to parse the data after parsing the YAML. It makes sense because I have already called :binary.split/2 once with a similar pattern.
  2. I am unsure how the lifecycle is working with the case do_parse.("\n---\n#{contents}") block. I understand that this call is using the ElixirMap parser, but I am lost how the YAML that has been parsed to a map is being passed through. Ultimately I am ending up in the other case and the value is being seen as nil.
alfredbaudisch commented 2 years ago

@rockchalkwushock nice, thanks for starting the pull request!

  1. You are right, I don't think that's needed and it can be problematic for bigger files or for a big volume of files (the do_parse.("\n---\n#{contents}")).
  2. I think this will be solved after (1) is investigated as well.

By the weekend I'll check it out.

rockchalkwushock commented 2 years ago

@alfredbaudisch I will play with it more tomorrow morning and see if I can get that figured out. I am close there is just something I am missing.

rockchalkwushock commented 2 years ago

@alfredbaudisch I am curious what you think about this idea, it would require some refactoring to the ElixirMap:

  1. We update the pattern to match against in ElixirMap
- :binary.split(contents, ["\n---\n", "\r\n---\r\n"])
+ :binary.split(contents, ["<!--  -->"])

<!-- --> is the commenting format in .md files.

In making this change we side step the issue of the overlapping patterns with the ElixirMap and the YAML parser (or any future parser.

From what I am seeing in the erlang docs for :binary.split/2 and from my own hacking around with it the default is to only split on the first instance of the pattern match so we would not get into any trouble with the user commenting in the markdown throughout the file. We would want to reach for :binary.split/3 and the [:global] option to match on all instances of the given pattern.

Elixir Map Format (default)

%{
  author: Turd Ferguson
  date: 2021-11-20
  title: That's a funny name
}
---

<!--  -->

Post content...

YAML Format

---
 author: Turd Ferguson
 date: 2021-11-20
 title: That's a funny name
---

<!--  -->

Post content...

Joplin Format

I am not familiar with Joplin so this could be incorrect.

That's a funny name

<!--  -->

Post content...

<!-- --> can act as the separator between attrs and body in any case (hopefully?) in which case then the parsers for metadata can be applied solely to attrs while we continue to just pass the body through until we are ready to parse that data.

Crude POC

# base_parser.ex

# default metadata parser is ElixirMap

do_parse = fn split_contents ->
  apply(parser, :parse, [path, split_contents, opts])
end

def parse(path, contents, opts) do
  ...
  case :binary.split(contents, ["<!--  -->"]) do
    [_] -> do_parse(contents)
    [_, contents] -> do_parse(contents)
    [attrs, contents] -> 
      # process `attrs` with corresponding parser
      case do_parse(contents) do
        {:ok, frontmatter, body} ->
          {:ok, frontmatter, body}
        other ->
          other
      end
  end
end
alfredbaudisch commented 2 years ago

@rockchalkwushock I'm really bad with keeping track of my 10.000 personal projects. I'm very sorry for keeping you waiting on this one. I'll try to come back to it soon.

rockchalkwushock commented 2 years ago

@alfredbaudisch no worries I have been pretty busy as well. I will circle back this evening and give this another look. Perhaps I can get it over the hump.