dev-cycles / contextive

Get on the same page.
https://contextive.tech
MIT License
244 stars 6 forks source link

Multi-line yaml can result in additional underscores #47

Closed uniquelau closed 1 year ago

uniquelau commented 1 year ago

Step to reproduce:

In the definition yaml use > to signify multi-line text.

contexts:
  - name: eCommerce
    domainVisionStatement: >
      Allowing customers to discover and purchase products and services online
      I like line breaks, so text is easy to read when reviewing pull requests on the
      train or bus when I am travelling.

Current result/situation:

The text that is rendered is wrapped with the underscore character _. The title (vision) is within this wrap.

image

Expected result:

The text should not be prefixed, and should be italized.

Work around

Add the strip block chomping operator. This causes everything is display correctly.

e.g. >-

image

contexts:
  - name: eCommerce
    domainVisionStatement: >-
      Allowing customers to discover and purchase products and services online
      I like line breaks, so text is easy to read when reviewing pull requests on the
      train or bus when I am traveling.
chrissimon-au commented 1 year ago

Thanks so much for reporting this - and for providing a workaround. While it does work, you do lose the linebreaks in the original definition, which if they are helpful/desired, is a shame.

As you've probably worked out, the issue is the assumption that there are no linebreaks in the use of markdown features (like the italics _ boundaries). Markdown support for linebreaks within certain formatting features (e.g. italics) depends on the markdown renderer, and while some support it, others (such as vscode, clearly), don't.

For those fields, the fix will be to iterate the lines, put the markdown formatting syntax around each line, and ensure the markdown 'linebreak' feature (double space at the end of the line) is used.

However, since you've pointed this out, I realise that the issue could affect any of the fields, including term names and context names and I suspect the appropriate handling depends on the field - i.e. some should respect line breaks, and some should strip them.

I'm going to suggest:

contexts:
  - name: <strip linebreaks>
    domainVisionStatement: <support linebreaks>
    paths:
    - <strip linebreaks>
    terms:
    - name: <strip linebreaks>
      definition: <support linebreaks>
      examples:
        - <support linebreaks>
      aliases:
        - <strip linebreaks>

Thoughts?

chrissimon-au commented 1 year ago

So, having started work on this, my initial assumption about the causes of the problem were not quite accurate. To explain why contextive does what it does, I need to dig into the yml block scalar header definition a bit:

Block Scalar Headers

According to https://yaml-multiline.info/, > is a style indicator as part of a block scalar header.

There are two style indicators:

(So in your examples, the newlines would get ignored as you were using the folded style, not the literal style. It's not clear if that's intended or not, see below.)

But that is not the whole story - There is another part of the block scalar header - the block chomping indicator, which controls what to do with newlines at the end of the value (not within the value). Once a yml value becomes a 'block scalar' it always has at least one newline at the end to get to the next fieldname. The chomping indicator can be one of three values:

Explaining the Issue Report

So the reason the first example in the issue report didn't work is that the it uses "folded" and "clip". Which means the yml:

contexts:
  - name: eCommerce
    domainVisionStatement: >
      Allowing customers to discover and purchase products and services online
      I like line breaks, so text is easy to read when reviewing pull requests on the
      train or bus when I am travelling.

renders to markdown with:

_Vision: Allowing customers to discover and purchase products and services online I like line breaks, so text is easy to read when reviewing pull requests on the train or bus when I am travelling.
_

This is not valid markdown, as the closing _ of an emphasised section must not be preceded by unicode whitespace. In this case it is preceded by a newline, so it is not rendered as emphasised (italics).

The workaround provided uses the "folded" and "chomp" style, which means the yml:

contexts:
  - name: eCommerce
    domainVisionStatement: >-
      Allowing customers to discover and purchase products and services online
      I like line breaks, so text is easy to read when reviewing pull requests on the
      train or bus when I am travelling.

renders to markdown with:

_Vision: Allowing customers to discover and purchase products and services online I like line breaks, so text is easy to read when reviewing pull requests on the train or bus when I am travelling._

which is valid markdown and thus renders with emphasis. However because of the folded style, the newlines are not rendered into the hover panel.

This may or may not be what is desired, depending on if the newlines are solely for ease of reviewing the yml, or if there is a desire for the newlines to be reflected in the hover panel.

Getting the newlines in the Hover Panel:

From the above, it's clear that we at least need to either use the "literal" "chomp" style, or the "folded" "chomp" style and two newlines. (We always need the chomp style, as markdown can't cope with the trailing newline). However, even this isn't quite enough yet if we want to get the newlines in the hover panel:

contexts:
  - name: eCommerce
    domainVisionStatement: |-
      Allowing customers to discover and purchase products and services online
      I like line breaks, so text is easy to read when reviewing pull requests on the
      train or bus when I am travelling.

renders to markdown with:

_Vision: Allowing customers to discover and purchase products and services online
I like line breaks, so text is easy to read when reviewing pull requests on the
train or bus when I am travelling._

Which looks correct, but shows visible as a single line:

image

This is because we are rendering to markdown, in which single newlines are also ignored. There are two options with markdown:

So, putting it altogether:

contexts:
  - name: eCommerce
    domainVisionStatement: |-
      Allowing customers to discover and purchase products and services online  
      I like line breaks, so text is easy to read when reviewing pull requests on the  
      train or bus when I am travelling.

(note the two spaces at the end of the first two lines), you get markdown:

_Vision: Allowing customers to discover and purchase products and services online  
I like line breaks, so text is easy to read when reviewing pull requests on the  
train or bus when I am travelling._

which visually looks like:

image

Which in some cases might be what is desired.

Next Steps for Contextive

I think there are at least three things I should do, and I'll endeavour to get them all in the next release:

Documentation Updates

Need to update the extension readme to be clear that if want to have multi-line values in your yml, you need to use either:

Improve rendering resilience

To improve rendering resilience, contextive should automatically chomp trailing whitespace from any field value where contextive is adding markdown rendering elements (e.g. _) that would not work with trailing whitespace.

Simplify newlines

Although we do want to allow users to use markdown in their descriptions, in many cases, users will not expect that they are writing markdown, and will probably find the need to add two spaces at the end confusing.

To make life simpler/more intuitive, we could automatically add two spaces at the ends of single newlines in the parsed value. This would mean that the render panel would always look exactly like the raw text in terms of newlines.

This might make more sense to a lot of users, but is technically a variation from standard markdown, and it would preclude the option to allow for the yml to be split into multiline for PR review purposes only while keeping a single line in the hover panel.

@uniquelau what do you think would be the most intuitive approach here?

uniquelau commented 1 year ago

Thanks Chris, I will review and add some comments - https://yaml-multiline.info/ is a useful visualiser!

chrissimon-au commented 1 year ago

Hi @uniquelau, thanks for that. I'm about to release a version that includes the proposed fix for your original issue - any fields that adds markdown elements around the field value will automatically 'chomp' trailing whitespace to ensure compliance with the markdown requirements. (27f86c1) This seems like a no-brainer - as contextive is adding the markdown elements, it should ensure they work.

I've also updated the documentation with some examples of how to achieve various newline goals (linebreak, new para, multiline yaml with no break in hover). (89d4c8b)

Regarding auto-insert of the two spaces to ensure there's always a linebreak (or auto-insert of 2 newlines to ensure there's always a new para), I look forward to your thoughts on that.

On the one hand it will likely avoid some confusion, but on the other hand it does restrict the options available as it would treat all newline scenarios as the same, but as per the current documentation, there are at least 3 options folk might like, and may be used to if they are used to the options markdown offers.

uniquelau commented 1 year ago

Okay, so a couple of thoughts!

In principle, the current implementation is that the vision statement and title are wrapped with _. This means that a markdown value from the specification can cause the last _ to become orphaned, and cause the layout break identified.

Re: changes

The additional of trim lines makes sense, but to be more robust, we could go one step further and "trim and remove any trailing line breaks" this would ensure non-chomped value always render correctly, as the final _ would end up on the last line of text.

Secondly - and this is more a product decision! If supporting markdown, does it make sense to explicitly make the value passed in italic? Perhaps only the title _vision:_ should be italic - again product decision, highly speculative, but removing the wrapping _, would allow the markdown to just do its thing, warts and all!

Re: original intention

I was using > to add line breaks into the YAML, to make the file easier to read. However, it only really works because of the block chomp! E.g. when someone reviews a PR, it's often side by side, so limiting length of lines can improve review speed. However, I don't want to interfere with how their IDE / contextive displays the info.

chrissimon-au commented 1 year ago

Thanks for your thoughts, and for clarifying your original intention!

Good news is, the trimming applies to the whole field, not just per line, so it does removing trailing newlines from the value (effectively enforcing the block 'chomp' behaviour from the yml spec) - there are some tests to confirm this. With the latest release (1.9.3) your original attempt would have rendered like this, even if you add extra newlines at the end, whether or not you added the block chomp explicitly:

image

Since I wasn't sure about if your intent was for the linebreaks to appear in the rendered hover, I have improved the documentation (https://marketplace.visualstudio.com/items?itemName=devcycles.contextive#multiline-yaml) to explain how best to have multi-line yaml in order to achieve one of three outcomes in the hover panel: linebreak, new para, no impact, at least for term descriptions.

However, this has got me thinking generally that these fields are susceptible to 'markdown injection' which could break the rendering - e.g.:

contexts:
  - name: eCommerce
    domainVisionStatement: >
      Allowing customers to discover and purchase products and services online
      I like line breaks, so text is easy to read when reviewing pull requests on the
      train or bus when I am_ travelling.

will render like:

image

While not as risky as its nasty cousins (e.g. sql or html injection), obviously not ideal!

To re-evaluate my assumptions, the main value of permitting markdown in these fields is to allow including things like hyperlinks, emphasis, and perhaps bullet/numbered lists in term definitions, and I think all these have merit.

For the vision statement, as you say, one option is to not apply any formatting and let users format themselves. I've experimented with this and I find the default (non-italic) vision statement to be a distraction from the main information - the term definition. So I think I'd like to keep it. (Honestly, I'd prefer subscript or font size reduction for it, but none of those is possible in vscode markdown!)

Interestingly, this is the only field contextive adds formatting elements around, and from experimentation, since markdown is quite resilient, as a result it's the only field susceptible to markdown injection. There are a number of markdown constructs that would break the italics - new paragraphs, lists, headings, etc. Possibly the only markdown in this particular field that is useful is bold emphasis (**bolded**) and hyperlinks.

So one option is to use an allowlist or blocklist and escape elements that would break the italics and document that while most fields allow full markdown, this one only allows restricted markdown. I'm going to ponder this a bit further to consider if it's worth it. Or perhaps wait to see if any other users report issues with it!