[IDEA] Store markdown tiddler with YAML metadata instead of .meta file

linonetwo commented 1 year ago

Is your feature request related to a problem? Please describe.

To include TiddlyWiki as "markdown editor" in various product lists, better md support & interoperability is required.

Most Markdown renderers seem to support YAML format for metadata at the top of the file

---
layout: post
published-on: 1 January 2000
title: Blogging Like a Boss
---

Content goes here.

https://stackoverflow.com/questions/44215896/markdown-metadata-format

So if nodejs tw can I/O md file in this way, it can save/load many workspace created by other md editor.

Describe the solution you'd like

Allow tiddlywiki/markdown plugin to inject custom format logic to saveTiddlerToFile method. This requires saveTiddlerToFile to receive a formatter.

Also need to change

https://github.com/Jermolene/TiddlyWiki5/blob/cc383e6d1e7c3a2dde5ab91dcf5b6fa7aca53121/boot/boot.js#LL1882C3-L1883C48

Describe alternatives you've considered

Or add extra if-else to

https://github.com/Jermolene/TiddlyWiki5/blob/cc383e6d1e7c3a2dde5ab91dcf5b6fa7aca53121/core/modules/utils/filesystem.js#L436-L450

If we want first-class support of md. I think tiddler.getFieldStringBlock works basically the same way for tid and md, just add extra ---.

Also need to change $tw.wiki.deserializeTiddlers and $tw.loadMetadataForFile

Additional context

After today's discussion about why not choose md syntax in tw, I just start wondering first class md support, while still being able to use things like [[]] [img[]] syntax, which might be a mixed experience of tid and md, but can be basically opened by other app.

But this is of low priority, because if WYSIWYG editor is good enough, a user won't need to know about what is MD or Wikitext (most "normal" people don't even use md!)

Jermolene commented 1 year ago

Thanks @linonetwo. I think it might be feasible to support optional metadata fields at the top of .md files; as you note, the implementation would be quite intricate.

I note from the Jekyll docs that front matter can be in any YAML format, not just the familiar name:value pairs. YAML parsers are big and so I don't want to include one in the core so we would only support the simple syntax.

A further potential semantic issue is that the Jekyll docs actually interpret front matter as variable definitions which can then be referenced within the MD text, which is not the same as our approach to storing metadata in fields. I'm not sure if that difference would actually matter in practice.

linonetwo commented 1 year ago

I'm searching if there is a minimal yaml2json2yaml lib, we only need minimal support for yaml syntax, covering 50% of syntax may already cover 99% of daily note-taking use-case, and enable 99% of interoperability between Tiddlywiki and Jekyll/Obsidian/Typora.

Also, a yaml serializer may make a multiline field works in .meta file, without the need to make it a json. https://talk.tiddlywiki.org/t/problem-with-node-js-configuration-multiline-fields-and-meta-file/7180

pmario commented 1 year ago

I have to say, that I did change my mind about the YAML format. I think we should implement it into the core.

The reason is, that I really hate the tiddler.json format if it's necessary to edit it with an external text-editor.

I did have a close look at the YAML spec: https://yaml.org/spec/1.2.2/
I did test every example from the page with: https://nodeca.github.io/js-yaml/
It's the playground for https://github.com/nodeca/js-yaml/tree/master/dist library which is 38k minified.

As linonetow posted, YAML would solve the "multiline field problem" for the .tid files in a standardized way.

---
title: test tiddler
tags:
  - a-tag
  - tag with spaces
field-name: test value
multi-line-hard-linebreak: |
 This is a test
 Line 2
multi-line-folded: >
 line 1
 line 2
 line 3

 forced line break above

gives us this JSON

{ "title": "test tiddler",
  "tags": [ "a-tag", "tag with spaces" ],
  "field-name": "test value",
  "multi-line-hard-linebreak": "This is a test\nLine 2\n",
  "multi-line-folded": "line 1 line 2 line 3\nforced line break above\n"
}

Especially the "plain text with newlines" spec is complex, but it has a lot of possibilities. We would not need to explain every possibility to our users. But it would be good to have all the possibilities in a standardized way, that other developers can read and write.

@Jermolene I actually would like to have it in the core.

Jermolene commented 1 year ago

I have to say, that I did change my mind about the YAML format. I think we should implement it into the core. @Jermolene I actually would like to have it in the core.

Hi @pmario the trouble is that all the full YAML parsers I've seen are very large. I could imagine us doing our own implementation of some of the syntax elements as an alternative, which is another way of saying that we could extend the .tid file format to support multiline fields.

In any case, I think the idea of supporting multi-line fields is orthogonal to @linonetwo's original post, which is asking about metadata within .md files.

pmario commented 1 year ago

As I posted with my 3rd link, the library that I did play with is about 38kByte in size and if fully supports the YAML spec 1.2 and it seems to have 64mio npm downloads per week. So it seems to be battle tested.

In any case, I think the idea of supporting multi-line fields is orthogonal to @linonetwo's original post, which is asking about metadata within .md files.

Yea, but the "front matter" metadata in .md files is YAML format.

Jermolene commented 1 year ago

Thanks @pmario I had missed that. But even 38KB seems excessive just to get multiline fields. YAML is complex and error prone, with many gotchas. Do we have evidence that Markdown front matter is generally parsed with a full YAML parser?

linonetwo commented 1 year ago

This is not only a parser but also a serializer, which includes some JSONSchema validator, we can delete that if we can ensure JSON safety in tw ourself. (Assume that we are going to copy the code (MIT licensed) into this repo).

I asked this question https://github.com/nodeca/js-yaml/issues/708

A full yaml syntax support also make it possible to store any JSON inside a field (for example storing JSONSchema in schema field in https://github.com/tiddly-gittly/super-tag/ ), so it might still worth it.

Jermolene commented 1 year ago

This is not only a parser but also a serializer, which includes some JSONSchema validator, we can delete that if we can ensure JSON safety in tw ourself. (Assume that we are going to copy the code (MIT licensed) into this repo).

If we were to adopt YAML then we would also need the serializer component. It would be used when saving tiddlers in YAML format.

linonetwo commented 1 year ago

I mean delete the JSONSchema validator, see my issue there. I don't know why those schema takes up 40kb, maybe they are part of the AST builder, or mix the validator with the AST builder.

Jermolene commented 1 year ago

I mentioned that the size of YAML libraries was a concern, but I do have other concerns about introducing YAML.

YAML is complicated, unpredictable and requires fairly deep knowledge to avoid the pitfalls. The classic example is the way that "no" is turned into Boolean false. YAML is popular with programmers, but not with end users. There appear to be quite a few sites dedicated to making the case against YAML; for example, https://noyaml.com/ .

There's a subtle specific problem with YAML in TiddlyWiki, which is the mismatch between YAML supporting arbitrary datatypes, and TiddlyWiki fields being defined as strings. For example, most of the time YAML allows double quotes around values to be omitted, and will automatically apply them to strings. But some unquoted strings are handled specially (eg "no" or "&" and "*"), which means that one either has to learn what those magic values are, or one has to be defensive and always wrap values with double quotes. That's suddenly made everything much worse than our existing .tid file format.

A related problem is what do we do if the value of a field is given as a YAML data type other than a string? One possibility would be to store the JSON equivalent of the data in the field, but that's not really consistent with how we treat string values (if we stringified field values that would mean that string values would be wrapped in double quotes).

So, to be clear, I do not support including YAML in the core, primarily for usability reasons. Instead, I think we should add support for multiline fields to .tid files, perhaps taking inspiration from YAML to do so.

linonetwo commented 1 year ago

Consider these cases, I agree only add syntax we need to the core.

Do you want to reuse the wikitext pareser set to build multi-line meta file syntax parser? And though we have parsers, tw don't have a serializer mechanism. I even need to write my own serailizer for wikitext.

Better if we can introduce serializer mechanism for meta file syntax this time, so it's easier to extend.

pmario commented 1 year ago

[. . .] I think we should add support for multiline fields to .tid files, perhaps taking inspiration from YAML to do so.

I would be OK with that. I'll post a suggestion at GH discussions. Link will follow soon.

linonetwo commented 6 months ago

Nowadays some online doc uses Obsidian's format, for example, https://github.com/PKM-er/Pkmer-Docs/tree/main/12-TiddyWiki

I want to contribute to that repo without using Obsidian, and use TidGi-Desktop or nodejs wiki server instead, so I/O of yaml metadata is a problem here.

This feels like a war on digital territory, battle of standard.

Jermolene commented 6 months ago

Hi @linonetwo I am very keen to improve the .tid file format. I think we should adopt the smallest possible subset of YAML that interoperates with Obsidian, and avoids us having to integrate a full YAML parser in the core. I'd welcome proposals.

TiddlyWiki / TiddlyWiki5

[IDEA] Store markdown tiddler with YAML metadata instead of .meta file #7534