dart-lang / yaml_edit

A library for YAML manipulation with comment and whitespace preservation.
https://pub.dev/packages/yaml_edit
BSD 3-Clause "New" or "Revised" License
25 stars 14 forks source link

Literal and folded strings cannot handle "\n \n" at the end of a string #89

Open jonasfj opened 1 week ago

jonasfj commented 1 week ago

Adding the test case 'whitespace between line breaks at the end\n \n' to test/string_test.dart causes tests to fail.

diff --git a/test/string_test.dart b/test/string_test.dart
index b92c20a..a18f57a 100644
--- a/test/string_test.dart
+++ b/test/string_test.dart
@@ -10,6 +10,7 @@ final _testStrings = [
   "this is a fairly' long string with\nline breaks",
   'whitespace\n after line breaks',
   'whitespace\n \nbetween line breaks',
+  'whitespace between line breaks at the end\n \n',
   '\n line break at the start',
   'word',
   'foo bar',

Tests failures:

$ dart test
00:01 +400 -1: loading test/golden_test.dart                                                                                                                                                                                           
Successfully tested 20 inputs against golden files, created 0 golden files
00:01 +400 -1: test/string_test.dart: Root FOLDED string (4) [E]                                                                                                                                                                       
  Assertion failed: (package:yaml_edit) Modification did not result in expected result.

  # YAML before edit:
  > 

  # YAML after edit:
  > >-
  >   whitespace between line breaks at the end
  >    
  >   

  Please file an issue at:
  https://github.com/dart-lang/yaml_edit/issues/new?labels=bug

  package:yaml_edit/src/editor.dart 579:7   YamlEditor._performEdit
  package:yaml_edit/src/editor.dart 249:14  YamlEditor.update
  test/string_test.dart 39:20               main.<fn>

To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root FOLDED string (4)'
00:01 +415 -2: test/string_test.dart: Root LITERAL string (4) [E]                                                                                                                                                                      
  Assertion failed: (package:yaml_edit) Modification did not result in expected result.

  # YAML before edit:
  > 

  # YAML after edit:
  > |-
  >   whitespace between line breaks at the end
  >    
  >   

  Please file an issue at:
  https://github.com/dart-lang/yaml_edit/issues/new?labels=bug

  package:yaml_edit/src/editor.dart 579:7   YamlEditor._performEdit
  package:yaml_edit/src/editor.dart 249:14  YamlEditor.update
  test/string_test.dart 39:20               main.<fn>

To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root LITERAL string (4)'
00:10 +624 -2: Some tests failed.                                                                                                                                                                                                      

Consider enabling the flag chain-stack-traces to receive more detailed exceptions.
For example, 'dart test --chain-stack-traces'.
jonasfj commented 1 week ago

@kekavc24 I'm not sure this is a new bug caused by https://github.com/dart-lang/yaml_edit/pull/87 but it's something we should fix.

At a minimum we should fallback to double quoted string encoded, if nothing else.

Indeed, I'd suggest the following steps:

  1. PR that makes _tryYamlEncodeLiteral and _tryYamlEncodeFolded for strings that end in \n \n and similar.
    • Figure out what strings aren't allowed at the end.
    • Maybe, just start with a safe thing, like if (string.trimRight() != string) return null
    • Add additional test cases that exercise all sorts of ways whitespace and linebreaks can be added at the end.
  2. Explore in what scenarios we can encode strings ending with whitespace / line breaks as:
    • Literal strings
    • Folded strings
  3. PRs and tests for literal and folded strings respectively.

I think (1) is the most important. if (string.trimRight() != string) return null is a big hammer, and maybe we can do better, but it's a good start, and it's better to safe than wrong :rofl:

jonasfj commented 1 week ago

Btw, I can be found on dart_community discord, I'm jonasfj, feel free to ping me.

kekavc24 commented 1 week ago

Adding the test case 'whitespace between line breaks at the end\n \n' to test/string_test.dart causes tests to fail.

diff --git a/test/string_test.dart b/test/string_test.dart
index b92c20a..a18f57a 100644
--- a/test/string_test.dart
+++ b/test/string_test.dart
@@ -10,6 +10,7 @@ final _testStrings = [
   "this is a fairly' long string with\nline breaks",
   'whitespace\n after line breaks',
   'whitespace\n \nbetween line breaks',
+  'whitespace between line breaks at the end\n \n',
   '\n line break at the start',
   'word',
   'foo bar',

Tests failures:

$ dart test
00:01 +400 -1: loading test/golden_test.dart                                                                                                                                                                                           
Successfully tested 20 inputs against golden files, created 0 golden files
00:01 +400 -1: test/string_test.dart: Root FOLDED string (4) [E]                                                                                                                                                                       
  Assertion failed: (package:yaml_edit) Modification did not result in expected result.

  # YAML before edit:
  > 

  # YAML after edit:
  > >-
  >   whitespace between line breaks at the end
  >    
  >   

  Please file an issue at:
  https://github.com/dart-lang/yaml_edit/issues/new?labels=bug

  package:yaml_edit/src/editor.dart 579:7   YamlEditor._performEdit
  package:yaml_edit/src/editor.dart 249:14  YamlEditor.update
  test/string_test.dart 39:20               main.<fn>

To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root FOLDED string (4)'
00:01 +415 -2: test/string_test.dart: Root LITERAL string (4) [E]                                                                                                                                                                      
  Assertion failed: (package:yaml_edit) Modification did not result in expected result.

  # YAML before edit:
  > 

  # YAML after edit:
  > |-
  >   whitespace between line breaks at the end
  >    
  >   

  Please file an issue at:
  https://github.com/dart-lang/yaml_edit/issues/new?labels=bug

  package:yaml_edit/src/editor.dart 579:7   YamlEditor._performEdit
  package:yaml_edit/src/editor.dart 249:14  YamlEditor.update
  test/string_test.dart 39:20               main.<fn>

To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root LITERAL string (4)'
00:10 +624 -2: Some tests failed.                                                                                                                                                                                                      

Consider enabling the flag chain-stack-traces to receive more detailed exceptions.
For example, 'dart test --chain-stack-traces'.

Sorry for this. It seems I overwrote it as I was resolving some merge conflicts I encountered while rebasing.

I wanted to drop the chomping indicator changes I made in between for #87 so that the squashed commits make sense when merged.

I’ll work on it 🫡

kekavc24 commented 1 week ago

I have a branch with the changes you suggested, I'll make a PR.

@jonasfj Additionally, I also managed to come up with a PoC that solves this issue permanently (hopefully) but comes with some breaking but solvable trade offs in another branch.


Issue One

After we wrongly tried using keep chomping indicator in #87, it got me wondering why it existed in YamlEditor in the first place. What issue was it trying to solve before my fix in #87?

The investigation led me back here 😅 , to this very issue we are trying to solve. The chomping indicator was meant to preserve the trailing \n in the initially buggy _tryYamlEncodeFolded and _tryYamlEncodeLiteral with hits and misses before my fix. And now it won't work. How come and why?

I investigated yamlEncodeBlockScalar which calls _tryYamlEncodeFolded and any other function that encodes block scalars. It looked okay. You helped me refactor it. The culprit ended up being yamlEncodeBlock which calls yamlEncodeBlockScalar.

If a YamlNode is a :

  1. YamlScalar, it calls yamlEncodeBlockScalar and returns the encoded string.
  2. YamlList or YamlMap, it recursively calls yamlEncodeBlock until all YamlScalars are encoded and joins then with a \n and returns the string.

No issue, right? Well, that's where the issue is.

It means our YamlScalar will never be given the chance to have an additional \n we need for _tryYamlEncodeFolded and _tryYamlEncodeLiteral to prevent issues.

Okay, issue found. Just apply the additional \n we need. I did this and quickly ended up having duplicate \n or even \nx where x is the number of YamlList or YamlMap are encountered before the next YamlScalar is found.

So I refactored that function to apply line-breaks correctly. After this 2 issues bubbled up:

  1. Dangling multiline comments messing with modified yaml!
  2. Dangling line breaks for YamlScalars that aren't folded/stripped. (I expected this but not 1 above 😅)

Issue Two: Comments

Once this occurred, I backed off to see if this was a known issue that needed fixing. That's when I found #40. In the issue, YOU outline ways we can/should handle this which ended up being what I had in mind. Thanks! (for the incredible foresight)

However, a few issues you overlooked that ended up being the trade-offs I mentioned:

  1. What happens to white-space after the comment(s)?
  2. Do we always look for comments greedily or lazily?
    • lazily - Should we just retain the white space we skipped after scanning ahead for comments and not finding any?
    • greedily - Discard any white space we find ahead irrespective of whether we found comments or not.

I ended up looking for and skipping comments greedily. At least to have a PoC that works. Armed with this, I continued my investigation started in issue one above.

I noticed that both _replaceInBlockMap and updateInList only look ahead up-to the first comment. While refactoring code here, I realised:


Dangling line-breaks

At this point, the issues I had were:

  1. How do I prune the additional line breaks without messing up folded and literal strings.
  2. How do I make sure folded/literal strings don't mess up the next element's indent. I ignored this here and instead solved it in issue_two.

I implemented the function that does that and also added some tricks when adding folded/literal strings.


It would be nice if you are able to review it and make suggestions for improvements. I can make a draft PR.

kekavc24 commented 1 week ago

@jonasfj sorry for the extremely long write up. I didn't want to create a PR without outlining the PoC itself since I make so many changes to the existing code without redefining its intention that may need your review.

jonasfj commented 1 week ago

Sorry for this. It seems I overwrote it as I was resolving some merge conflicts I encountered while rebasing.

Yeah, the PR was getting pretty complicated, we should probably have done fewer things. No worries, we can still fix bugs now :D

Also it's not like there weren't any bugs before this PR :see_no_evil: :rofl:

jonasfj commented 1 week ago

Okay, this might take a while to review.

Thinking about it, I do wonder, is it really a good idea to use folded or literal strings without chomping -?

Certainly the + modify is a but dangerous, in scenarios like:

Example 1

A: |+
  foo

B: true

Example 2

A: |+
  foo

I guess what I'm asking is? Instead of trying to handle cases like these correctly, would it be better if we just fallback to use "foo\n\n".

In practice I suspect it's rare that yaml_edit will be asked to deal with these things. So it might be better if we have a package that can do less, but do it correctly every time.

If we have to fallback to double quoted encoding, that's not nearly as bad as throwing a runtime exception because of an internal error, which is what will happen if we modify the YAML document incorrectly (_performEdit makes sure of that).

I'm just asking the question here:

kekavc24 commented 1 week ago

I guess what I'm asking is? Instead of trying to handle cases like these correctly, would it be better if we just fallback to use "foo\n\n".

True. That's why I still fielded #91 that should precede #90. #90 is just a PoC that I believe can get better once you take a look and help me nitpick the rough edges.


Certainly the + modify is a but dangerous, in scenarios like:

That's why I never attempt to use it at any point within the PoC. I discovered a nifty trick that I strongly highlighted and documented. I was surprised it passes most existing tests including random_test.dart since the PoC ensures no dangling comments are left behind.


In practice I suspect it's rare that yaml_edit will be asked to deal with these things. So it might be better if we have a package that can do less, but do it correctly every time.

Yeah, that makes sense