Add capability for multi-line items

gamburg / margin

Lightweight markup designed for an open mind

https://margin.love

MIT License

189 stars 9 forks source link

Add capability for multi-line items #2

Closed gamburg closed 4 years ago

gamburg commented 5 years ago

Sometimes you want an item to be a block of text. Using a special character like \n doesn't follow Margin's rules of plain text readability.

Maybe something like a pipe | character that enforces strict text blocks:

Favorite Quotes

    | Be thankful for what you have; you’ll end up having more. 
    | If you concentrate on what you don’t have, you will never, ever have enough. 
    |
    | - Oprah Winfrey

    | You can discover more about a person in an hour of play than in a year of conversation.
    |
    | - Plato

tallforasmurf commented 4 years ago

How about a single leading apostrophe with the text paragraph ended by a blank line.

Favorite Quotes

   Mark Twain

'Keep away from people who try to belittle your ambitions. Small people always do that,
but the really great make you feel that you, too, can become great.

'The difference between the _almost_ right word and the right word is
really a large matter. ’tis the difference between the lightning
bug and the lightning.

    Epicurus [341-270 BCE]

'Is God willing to prevent evil, but not able? Then he is not omnipotent.
Is he able, but not willing? Then he is malevolent. Is he both able and willing?
Then whence cometh evil? Is he neither able nor willing? Then why call him God?

dzfranklin commented 4 years ago

To enable both automatically-generated text that wants to look maximally "pretty" and users jotting down notes who want them to be as efficient as possible, what about if the two following are equivalent?

    | Be thankful for what you have; you’ll end up having more. 
    | If you concentrate on what you don’t have, you will never, ever have enough. 
    |
    | - Oprah Winfrey

    | You can discover more about a person in an hour of play than in a year of conversation.
    |
    | - Plato

    |
     Be thankful for what you have; you’ll end up having more. 
     If you concentrate on what you don’t have, you will never, ever have enough. 

     - Oprah Winfrey
    |

    |
    You can discover more about a person in an hour of play than in a year of conversation.

    - Plato
    |

Another thought, which would be incompatible with the existing syntax, is that a number of prior systems have used > to indicate this.

One potential problem with ' is if it would complicate the parser to disambiguate it from quotes that start at the beginning of a line. The issue could come up if users use significant whitespace to separate for example different sections.

For example, would this parse as a multi line block?:

'foo bar baq' <- thoughts
'foo bar' <- thoughts2

"another section" <- thoughts3

Accidental single-line multiline blocks might be less of an issue if all a multiline block indicates is that the text spans multiple lines.

xkortex commented 4 years ago

A single line break shouldn't cause a line to split into two items. But I think the simpler approach to this is just look for a double line break or a delimiter, like Markdown does.

this is a line
that continues on the next line
___
this is an item

this is a second item
___
- lists are easier to parse
- since they always have a prefix

burlesona commented 4 years ago

I disagree with this for two reasons:

First, for the human, I would very much like to be able to make simple lists like...

Shoe Brands:
  Nike
  Adidas
  Puma

.. without being required to leave extra empty lines or delimiters between things to indicate they are separate items.

Second, from the point of view of trying to balance this being simple for both humans and computers, I would argue that the spec should instead be strict, that an item exists as one line only, terminated by a \n. Nearly every program that displays text can (and does) word wrap, and many allow you to define a custom word wrapping. I don't think the added complexity and ambiguity is worth the benefit of supporting manual line breaks without splitting items.

gamburg commented 4 years ago

These are all good suggestions.

I want to spend some time thinking about what @burlesona suggested here. Specifically: is multiline support already present in the current Margin spec to the extent that it needs to be? Instead of adding this to the spec, might it actually be up to the interpreting application how to relate (or not relate) sibling items?

The concern with specifying multiline in the spec:

Let's say we have a long form writing application. The lowest-level items are likely to be interpreted as paragraphs:

Pride and Prejudice
    Chapter 1
        It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
        However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.
        “My dear Mr. Bennet,” said his lady to him one day, “have you heard that Netherfield Park is let at last?”

My concern is that simply allowing for a special mulitline character like a ', or the |, will cause people to think they have to use them, even in situations where a simple line break would do. If we were to add | to the spec, I could easily see a writing application wanting (for style points alone) to store the above as:

Pride and Prejudice
    Chapter 1
        | It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
        | However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.
        | “My dear Mr. Bennet,” said his lady to him one day, “have you heard that Netherfield Park is let at last?”

That would be bad, no? It would reinforce a syntactic requirement that doesn't exist. And those bad habits would be likely to spread.

The concern with not specifying multiline in the spec:

If multiline items became a common need, we wouldn't want multiple application-dependent standards to form.

I could see the same long form writing application really needing a way to capture block quotes:

A Thesis on "Pride and Prejudice"
    Section 1
        Austen's famous novel begins simply:
            | It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
            | However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.
            | “My dear Mr. Bennet,” said his lady to him one day, “have you heard that Netherfield Park is let at last?”

But, in the above example, couldn't the writing application simply choose to interpret fourth-order items as block quotes? Thus ridding us of the need of the special | token in the first place, and moving this multiline question back outside the scope of the spec?

benjaminwil commented 4 years ago

Hello,

Over the last few days, I've also been thinking about multi-line items in Margin. Then, I began to think about multi-line and multi-paragraph items, which I think is what the given examples on this issue's original comment are.

Denoting the beginning and end of an item when a decoration is used

What I thought the most about, and what I think is the most interesting problem (if multi-line and multi-paragraph items should exist), is how they can be represented alongside shorter-form items.

In Markdown, I would format a multi-paragraph <li> item like this:

- A single-line item.
- A multi-line item that also happens
  to have multiple paragraphs:

  Just continue indenting until the next
  list item, until the next sub-list item, or
  until regular body text resumes.

Body text. The list does not continue.

I think that Margin could solve this in the same way:

Visually, the decoration marks the beginning of the item.
The next decoration marks the beginning of the next item, and thus, the end of the previous item.
There's no special syntactical characters required. Just proper indentation.

If the next item is a parent or a child – Margin already knows what to do.

To make this work, though, there's one assumption that would need to be made: all of the multi-paragraph item's peers should start with a decoration (although it wouldn't matter which one).

This could be great in personal wiki or notes application, where short-form and long-form items might be stored as peers:

Lisa Robertson quotations
  - The biggest problem with melancholy is that it is more detailed than the world.”
  > You are insistent about the uncovering of this potential indifference.

    *Carnations and peat moss and a collapsing wall.*

    Who are you in relation to this woman?
  - You might go so far as to falter.

To-do list
  [x] Get more tofu.
  [ ] Eat the tofu.

A long form writing application example

For an application strictly concerned with storing long-form text (like in the above comment), I do not see a need for additional syntax. Like @gamburg says, a paragraph could simply be the common lowest-level unit.

Alternatively, the application could store multiple paragraphs as single items with a > decoration. This also allows applications to be more opinionated and, say, hard-wrap lines when they reach 80 characters. (which is my preference).

Or, as in the example below, if you are recording poetry, you can clearly denote when items should be single-line items or multi-line items:

Haiku [author: Bashō]
  Japanese:
    > 五月雨を　あつめてはやし　最上川
    > Samidare wo/ Atsumete Hayashi/ Mogamigawa
  English:
    > The rains of summer join together.

      How swift it is

      Mogami River.

Sidenote: Poetry opens a whole other can of worms, where "stanza/paragraph breaks" are different than "line breaks", and it's always gonna be weird to represent those differences.

To use the example from the comment above, and address the question of "needing a way to capture block quotes" in a long-form writing application, Margin could, again, use an item decoration as a way to keep track of where an item should start and end:

A Thesis on "Pride and Prejudice"
  Section 1
    Austen's famous novel begins simply:
      > It is a truth universally acknowledged, that a single man in
        possession of a good fortune, must be in want of a wife.

        However little known the feelings or views of such a man may be on his
        first entering a neighbourhood, this truth is so well fixed in the
        minds of the surrounding families, that he is considered the rightful
        property of some one or other of their daughters.

       “My dear Mr. Bennet,” said his lady to him one day, “have you
        heard that Netherfield Park is let at last?”
    In this essay, I intend to deconstruct blah blah blah...

Thanks for reading my long-winded idea.

mtsknn commented 4 years ago

Could multiline items be annotated with, well, annotations? Here's a quick idea, though I don't know how comfortable this would be in practice:

A paragraph:

[multiline]
  Bacon ipsum,
  et cetera.

My poems:
  - [multiline]
    To be,
    or not to be.
  - [multiline]
    Lorem ipsum,
    dolor
    sit amet.
  - [multiline]
    Ham, spam and eggs

Then it would be an application-level concern to join the children of each item that is annotated with [multiline].

[multiline] could as well be [multi-line], or [ml], or [p] (as in paragraph), or anything else; that would also be an application-level concern.

I'm not sure if the above snippets are valid Margin, especially the paragraph example, but the current parser seems to interpret them just like I do in my head. The second point in #11 seems also relevant.

For a list of multiline items, another annotation could be used, e.g.:

My poems: [multiline-list]
  -
    To be,
    or not to be.
  -
    Lorem ipsum,
    dolor
    sit amet.
  -
    Ham, spam and eggs

To take this idea further, annotations could also be used to annotate e.g. code blocks:

Here's a cool function:

[code: js]
  const foo = () => {
    return 'bar'
  }

Joining the children would be more difficult since there are nested children, but it should be doable.

One more idea:

If ornaments were also captured, as discussed in #6, one could also use e.g. > to denote multiline items:

My poems:
  >
    To be,
    or not to be.
  >
    Lorem ipsum,
    dolor
    sit amet.
  >
    Ham, spam and eggs

This way, it would again be an application-level concern to join the children of each empty item that is ornamented/prefixed with >.

burlesona commented 4 years ago

Of all the suggestions for multi-line I think the simple designation of > as a control character that starts a multi-line block makes the most sense.

However, I question why it's needed. For example, consider the following very logical input:

My favorite haiku:
    Lorem ipsum,
    dolor
    sit amet.

This is perfectly valid Margin per the current spec. The item "My favorite haiku" has three lines, and each line is an "item." That's actually pretty logical.

In the same way, a long run of novel text would be valid Margin, and each item would represent a paragraph.

Chapter 1
  Our story begins in a dark alley on a foggy night...

  "Oh look, an eagle!"

  ...and with a devilish grin, the villain snuck off into the night.

This is just one item, "Chapter 1", containing 3 paragraphs.

These are very natural representations of structured content, nothing special needs to be done to them.

When comparing the spec for Margin so far, it has a lot of useful stuff in it, all of which is pretty obvious. One great thing it has going for it is there isn't that much stuff in it, and therefore you have a lot of freedom to just write text in a way that feels very natural but still capture a lot of relationships among that text in a way the computer can parse.

Trying to layer on additional semantics, where the user needs to learn to remember to put special control characters, or follow rules where if one line is ornamented and the children are not then they aren't children but instead are multi-line content... it's just adding a lot of mental overhead for very questionable value.

Finally and perhaps most importantly I would consider what this is being used for. The indentation-based hierarchy is nice, but typing in ever-more-indented text is not actually a great experience for long-form writing. By comparison, Markdown is very nice for long-form writing, and has the advantage of already being ubiquitous.

I think Margin would have a better chance of carving a niche for itself if it focused on being a simple, elegant way to make structured data out of plain text, which is a natural compliment to the elegance of long-form writing in Markdown, and did not try to act as a replacement or superset for the kind of long-form writing that Markdown excels at.

The names even go nicely together (Markdown and Margin, Margin and Markdown). So why not leave thing simple and elegant, and let each tool shine for its own use case? :)

mtsknn commented 4 years ago

The examples given by @benjaminwil look nice, at least to my human eyes. (The hanging quotation mark in the last example looks problematic from parsing point of view, but it might be accidental.)

How should children in the middle of a multiline item be handled? For example:

Lisa Robertson quotations
  - The biggest problem with melancholy is that it is more detailed than the world.
  > You are insistent about the uncovering
    of this potential indifference.

      *Carnations and peat moss and a collapsing wall.*

    Who are you in relation to this woman?
      You might
      go so far
      as to falter.

I can't think of a clean way of storing this, at least in the current data structure. Anyone?

I also wonder how this would look like in the specs... Currently the syntax rules of items are very simple. From the documentation:

Each line represents an item
Each item can have a single parent and multiple children. Indentation alone determines this hierarchy
Leading and trailing ornamentation as well as blank lines are ignored

If multiline items were implemented like this, the rules could be something like:

Each line represents an item, except multiline items are prefixed with ornamentation and continue until a sibling item (an item with the same level of indentation) or an item with smaller indentation is encountered
- (How about children in the middle?)
Each item can have a single parent and multiple children. Indentation alone (but actually also leading ornamentation) determines this hierarchy
Leading and trailing ornamentation is ignored (though they are used to identify multiline items). Blank lines are also ignored, except in multiline items, where they denote paragraph gaps

Hmm. 😅

mtsknn commented 4 years ago

Another uncertainty with multiline items: how should annotations be handled? Let's say I wanted to add a link (see #10) to my multiline item (I'm using the syntax from my earlier comment):

>
  Have you ever used
  DuckDuckGo [link: https://duckduckgo.com]
  ? I like it.

What would the JSON look like?

Though I just noticed that there would be a space before the question mark (given that the lines of a multiline item are concatenated and separated with spaces), so this wouldn't be good anyway...

benjaminwil commented 4 years ago

Just to attempt to simplify how the rules might be written if decorations can be used to mark multi-line items:

Each line represents an item.
Each item can have a single parent and multiple children. Indentation alone determines this hierarchy.
Leading and trailing ornamentation is not parsed as part of the item's value.
Leading ornamentation can be used to split an item over multiple lines.

Is that too ambiguous? To me, it seems intuitive.

And to clarify how I think these multi-line and/or multi-paragraph items could be parsed:

Parent item
  - Child split
    over multiple lines.
    [with: annotation]

    And two paragraphs long.

Parsed as:

"value": "Parent item",
"children": [
  "value": "Child split over multiple lines.\nAnd two paragraphs long.",
  "annotations": {
    "with": "annotation"
    }
]

This would not benefit formatted text like poetry and code snippets, where each \n matters. But maybe that is okay. I mostly just dislike soft-wrapping in text editors, and would prefer to hard-wrap my lines at 80 or 100 characters without them being parsed differently.

The placement of an annotation would not be able to be represented. That makes sense to me: annotations should annotate an item, not part of an item. (I don't think in-line hyperlinks work as annotations for this reason.)

I also don't think there's the possibility to have a child item in the middle of a multi-line item, in the same way that you can't have a child item in the middle of a single-line item.

mtsknn commented 4 years ago

Is that too ambiguous? To me, it seems intuitive.

Definitely clearer than my attempt.

I mostly just dislike soft-wrapping in text editors, and would prefer to hard-wrap my lines at 80 or 100 characters without them being parsed differently.

That's what I prefer too and one reason why I think multiline items might be nice to have.

The placement of an annotation would not be able to be represented. That makes sense to me: annotations should annotate an item, not part of an item. (I don't think in-line hyperlinks work as annotations for this reason.)

Oh, right, of course. Good point. (Besides, annotating parts of an item is a separate issue: #10)

I also don't think there's the possibility to have a child item in the middle of a multi-line item, in the same way that you can't have a child item in the middle of a single-line item.

Another good point.

What is still unclear to me is that how should indented lines be parsed? They must be parsed somehow. E.g. this:

Parent item
  - Child split
    over multiple lines.

      With an indented line in the middle

    I'm also split
    over several lines

      As well as at the end

I can think of these four possible outputs:

// Indented lines are trimmed and included in the value
{
  "value": "Parent item",
  "children": [
    {
      "value": "Child split over multiple lines.\nWith an indented line in the middle\nI'm also split over several lines\nAs well as at the end"
    }
  ]
}

// The first indented child is interpreted as a child
{
  "value": "Parent item",
  "children": [
    {
      "value": "Child split over multiple lines.",
      "children": [
        {
          "value": "With an indented line in the middle"
        }
      ]
    },
    // A sibling. Multiline even though not ornamented
    // (but the preceding sibling _is_ ornamented)
    {
      "value": "I'm also split over several lines",
      "children": [
        {
          "value": "As well as at the end"
        }
      ]
    }
  ]
}

// The first indented child is interpreted as a child
{
  "value": "Parent item",
  "children": [
    {
      "value": "Child split over multiple lines.",
      "children": [
        {
          "value": "With an indented line in the middle"
        }
      ]
    },
    // A sibling. Not multiline because not ornamented
    {
      "value": "I'm also split"
    },
    {
      "value": "over several lines",
      "children": [
        {
          "value": "As well as at the end"
        }
      ]
    }
  ]
}

// The last indented child is interpreted as a child. Probably the most difficult
// to parse (would need to look ahead to see if this is the last indented child)
{
  "value": "Parent item",
  "children": [
    {
      "value": "Child split over multiple lines.\nWith an indented line in the middle\nI'm also split over several lines",
      "children": [
        {
          "value": "As well as at the end"
        }
      ]
    }
  ]
}

So, while the rules might be intuitive to humans, maybe they are not that intuitive from parsing point of view.

@benjaminwil, I'd be interested to hear your thoughts. :-)

gamburg commented 4 years ago

I agree with @burlesona that it's important to keep the scope of Margin in mind here:

Trying to layer on additional semantics, where the user needs to learn to remember to put special control characters, or follow rules where if one line is ornamented and the children are not then they aren't children but instead are multi-line content... it's just adding a lot of mental overhead for very questionable value.

Most of the plain text interpreting of Margin is going to be done by the thinker anyway. Whether in TextEdit, Notepad, Stickies, etc. In which case, it's not all that important how they represent multi-line phrases. E.g. if the thinker wants to add pipes to @burlesona's example because it helps them think, they are free to do that:

My favorite haiku:
    | Lorem ipsum,
    | dolor
    | sit amet.

Side-note: perhaps pipes should be added to the list of ignored leading & trailing characters.

If a hypothetical future application has a specific need for multi-line text that isn't covered by simply incrementing the hierarchical level of the multi-line items, perhaps we should let those applications figure it out themselves (whether through ornamentation, annotations, etc). Then, if there seems to be a consensus that this is needed, we'd have a bunch of examples of how applications chose to implement it to choose from.

I know this sounds like the lazy approach, but for Margin I think lazy can be considered a virtue when it comes to the syntactic specifications.

benjaminwil commented 4 years ago

After exploring this some more, I think that I agree.

I think it would be reasonable for an application that is concerned with long-form content to find their own way to join multiple items post-Margin.

JSON arrays seem to preserve ordering, too, so an array of children would be reproducible over and over again in different applications. :100: