KenKundert / nestedtext

Human readable and writable data interchange format
https://nestedtext.org
MIT License
362 stars 13 forks source link

Consider different approach to multiline strings? #34

Closed evanmoran closed 2 years ago

evanmoran commented 2 years ago

Hi there! I'm a big fan of nested text, but as I went through the docs I was surprised with your choice for multiline strings:

address:
    > 138 Almond Street  
    > Topeka, Kansas 20697

Perhaps you'd consider an alternative way to specify this? I find needing to add the > character on every line isn't ideal especially long pieces of text or for html templates, etc!

My guess is you can use your indention rules pretty well to find the end of the string without a character on each line. I immediately was hoping for this syntax:

address:
    138 Almond Street  
    Topeka, Kansas 20697

But I think this is also clear:

address: >
    138 Almond Street  
    Topeka, Kansas 20697

Or even this if you prefer it as the first character

address: 
    > 138 Almond Street  
    Topeka, Kansas 20697

To my surprise YAML also supports all of these directly. Here's a playground to look (they have perhaps too many options) : https://yaml-multiline.info

Additionally, here's a couple examples that would improve the nested text significantly with this as an option:

website:
    product page template: >
        <h4>Product Info: {{name}}</h4>
        <ul>
              <li>Product: {{name}}</li>
              <li>Color: {{color}}</li>
              <li>Price: ${{price}}</li>
        </ul>

In the current syntax it would have to be this, which is quite hard to type and pretty confusing!

website:
    product page template: 
        > <h4>Product Info: {{name}}</h4>
        > <ul>
        >      <li>Product: {{name}}</li>
        >      <li>Color: {{color}}</li>
        >      <li>Price: ${{price}}</li>
        > </ul>

The other situation is larger embedded text (readme, documentation), and again a character on each line would be very hard to read:

screenplay:
   name: Groundhog Day
   contents: >
      INT. BREAKFAST ROOM 

      Phil enters the old library of the house and finds everything
      exactly as it was the day before. Mrs. Lancaster spots Phil as
      she comes out of the kitchen with the fresh pot of coffee.

          MRS. LANCASTER
          Did you sleep well, Mr. Connors?

          PHIL
          (completely confused)
          Did I? I don't know--

          MRS. LANCASTER
          Would you like some coffee?

          PHIL
          Yes, thank you. I ' m feeling a
          little strange.

          MRS. LANCASTER
          (as she pours)
          I wonder what the weather's going
          to be like for all the festivities.

          PHIL
          Did you ever have deja vu, Mrs. Lancaster?

As for parsing, obviously it's more tricky because multi-line strings could contain a colon. The key, I think, is either to require the > character at the start (easiest), or to look at the first line. If it doesn't contain a colon or start with a - then it's a multi-line, and subsequent lines with the same indention are appended.

Thanks for considering. Very cool project!

KenKundert commented 2 years ago

Thanks for the suggestions. I'm afraid we considered all of your ideas and all of the various ways YAML supports multiline strings before we settled on the current approach. There are a boatload of issues that must be considered. For example, where does the string begin, where does it end, and given that the lines in a multiline string may begin with spaces, where does the indentation end and the spaces begin. I suspect it is possible to come up with a scheme that might work, but it would require fixing the indentation level and taking everything within the indented block literally, which would exclude comments and the blank lines that would normally be ignored.

One fundamental test of a format like JSON, YAML, or NestedText is "can it contain values that are data encoded in its own format?". Specifically, can a NestedText document contain another NestedText document as a string. Here the > that leads every line is an asset because it clearly distinguishes the string from surrounding data, and within the string you can include any character or syntax. Thus a NestedText document can contain another NestedText document without conflict, confusion, or a blizzard of quoting and escaping.

Another nice benefit of the current approach from an implementation perspective is that the current format is very easy to parse. The reader program can look at each line individually and determine its type. This is not really a concern for the end user, but it will encourage more implementations, which should aid in adoption.

When we were considering using the > tag at the beginning of every line of a mutliline string, I was worried that they would be cumbersome to enter. However, I was pleasantly surprised when my editor (Vim) automatically entered them for me. Vim thinks of > at the beginning of a line as a continuation character because it is used that way in email. Perhaps you can look into configuring > as a continuation character in your editor.

A design goal with NestedText was to make the format close to what people might use themselves to entered structured data if there were no constraints. For example, how you enter an address list in a text editor where your only goals were to hold all the relevant information in an clear and consistent way, but with the assumption that only people would read the file. I think with NestedText we came close to that ideal. The biggest exception is the use of > to lead each line of a multiline string. My only defense is that all the alternatives we considered were either ambiguous or required a complicated set of rules that were inconsistent with those used elsewhere in the document, making the format more difficult to learn.

evanmoran commented 2 years ago

Thank you. I really appreciate the response!

I had no doubt that you considered everything a ton, including yaml, because nested text is elegant in a way that's hard to achieve without taking the time needed :)

I also think you'll find that people don't want to type or even see the prefix for anything but trivial examples. In a markdown or news forum convention (where vi's feature is probably coming from) the > symbol is usually used to quote a single paragraph in-line. This style works really well visually because these quotes are short and it makes them distinct from the response that follows.

For longer text it feels really heavy, and in reading between the lines of your response, I think you know it too! The block of text looks much nicer without the > on every line. This is definitely the crux of it. Handling comments right, parsing the end and beginning, choosing how newlines are kept or not, these are the hard challenges with a different approach. But for a document model that only has strings, multiline strings need to be first class citizens. The extra effort is worth it!

KenKundert commented 2 years ago

Sorry I did not mean to close the issue when you had more to say.

I agree with you that it would be better to not have the leading > symbols on multi-line strings. I personally thought long and hard on how to accomplish that. If fact I probably spent at least 6 years on and off thinking about how to create a format like NestedText. Every attempt I made hit a dead end until the current approach occurred to me. I do believe it is possible to create such a format, but I was never comfortable with the trade-offs.

Regardless, this ship has sailed. NestedText is not clean as I had hoped, but as it is it is a nice advance over JSON and YAML. It has been released and others are building new versions in other languages. This is not the time to make fundamental changes to the base format. Instead we continue to develop NestedText as it is, and if someone takes inspiration from it and develops a new format that improves on it, much the way I took inspiration from YAML and improved on it, then so much the better.

evanmoran commented 2 years ago

All good. Apologies for the delay, I have a newborn and my time is not my own 😂

I know you are pretty settled so it's no problem to close. I only bring it up because nested text is so cool, but the multi-line stuff just isn't perfect and I had to see if you knew it too.

One idea would be to ask your community of porters if they'd be open to experimentation on improving multi-line support in v2.0. I actually would guess adding indention based multi-line support would not break compatibility (> prefixes can still be parsed as is) and they may be interested in improving this as well.

Thanks again for the thoughtful replies!

LewisGaul commented 2 years ago

Personally I completely agree with NestedText's approach, where each line type is identifiable without surrounding context. This makes it easier for parser implementations, but also makes it easier to visually parse IMO.