Closed brunoborges closed 3 years ago
Can you give me an example?
Oh, you mean hierarchy enforced by indentation? Yes. It is a baked-in design choice. So if that is your issue, you can close this issue now as it is not changing.
Perhaps @brunoborges is referring to interpreting blank values after the colon as the field value as described in the documentation and copied bellow.
treasurer:
name: Fumiko Purvis
address:␣␣
> 3636 Buffalo Ave
> Topika, Kansas 20692
The fact that a second space is meaningful is counterintuitive and will certainly lead to problems. Empty values could be represented by:
I struggled with this issue for a while. Eventually I decided that anything beyond ':␣' is the value. That is the simplest and most flexible behavior. If that is not the case, I don't think you can specify a string value that contains only whitespace.
I don't think I can argue with that, but it would still be possible to have a string of blank characters using the syntax for multiline using a single line. It is certainly not simpler though.
Yes, you are correct. Either way there are trade-offs. My feeling is that most people will think that the way dictionaries, lists, and rest-of-line strings are specified is very natural, but the way multi-line strings is specified is artificial or arbitrary, or at least unfamiliar. So I am trying to avoid forcing people to use them except in cases were they are obviously required.
Having values that consist entirely of whitespace is quite unusual, so I don't see a problem with that special case using a slightly less intuitive syntax. It remains to be seen whether leading spaces will cause confusion in practice but I will note that "extra-whitespace problems" are some of the least intuitive problems to debug.
"Error: Cannot find Foobar named Xyzzy".
I also think that having to use space characters at the end of a line to identify a string of whitespace will likely cause issues when people will have their editor automatically strip trailing whitespace off.
Would it not be acceptable to request using quotes when a string must contain leading or trailing whitespace?
Except in keys, quotes are not treated special. If you add quotes, they become part of the value. However, the application that incorporates a NestedText reader is free to strip off excess white space if it feels that is appropriate. In other words, NestedText is trying not to have any expectations of what a value should be other than it being a string of characters that continues to the newline.
As it is now, Nested Text uses a very simple solution: no exceptions to the rule that after :
comes the value. There is great value to this.
I can see how this could lead to problems with unintended extra space, but I'd like to add that:
A better solution would need to be similarly simple. Perhaps there isn't one.
Would it not be possible to have another separator prefix to identify a quoted value? The key:value pair could, for example also be represented by a key="value"
I am not suggesting to replace the due:value pair syntax, but by adding another possibility of key="value". So 99.9 %s of the case where whitespace is not significant, the simpler key:value syntax would be used. When someone wants to identify explicit whitespace then they could use key="value" and perhaps key= "value"
The overall syntax would be explicit. and you would retain the ability/benefit of your original value:pair syntax, to the expense of a little bit more code to handle the special case.
Depending on editors to handle syntactic concerns would kind of defeat and contradict the benefit of using your proposal, wouldn't it?
@pierre-rouleau Would your proposed syntax handle nested quotes?
@prescod It all depends how the new, additional, key="value" syntax would be treated.
Of course this would not look as cleanly as the original key:value syntax, that would still be available for the majority of the use cases. The new key="value" would just provide a mechanism to handle the cases more difficult to handle, with an explicit syntax. That new key = "value" syntax would be a special case. The fact that it starts with the '=' character would just mean the interpretation of the remainder of the line differs.
@prescod BTW, allowing escaping inside the key = "value" line would mean that you could include things like non-breaking space (U+00A0 No-BREAK-SPACE Unicode character) as much as anything, all depending on the escaping mechanism that would be supported.
We don't see NestedText as the whole solution to the 'data storage that allows human interaction' problem. Rather, it is the piece that provides the ability to store structured data without interpreting that data in any way (other than to extract its structure). This is why we don't throw away characters (stripping whitespace), nor do we treat characters special (escaping). In our mind, that is the job of the application that receives the data. Specifically, I don't see anything propose here that could not be implemented in the end application if it is desired.
This is kind of a new concept, so we would like to give things time to play out before we start changing things.
I see this as a serious issue. Human readable and significant leading/trailing whitespace in a string do not go together.
I see two possible solutions:
less preferred: if the value already assigned is composed entirely of white space, and a nested value is encountered, replace the white space value with the nested value (this solution still has the serious problem that empty strings and whitespace strings look the same)
more preferred: strip off one pair of surrounding quotes test: "" ==> an empty string test: " " ==> a single space test: "hello" ==> the string hello (without quotes) test: ""hello"" ==> the string "hello" (with quotes) test: """hello""" ==> the string ""hello"" (with double quotes) test: "hel"lo" ==> the string hel"lo (with a single embedded quote)
Ah, I see @pierre-rouleau already suggested this.
Please make this change. I really like the NestedText idea, but will never use/support this format while it contains this flaw.
(Thank you @KenKundert for considering this.)
Human readable and significant leading/trailing whitespace in a string do not go together.
I find that statement too vague. Precisely what problem are you trying to solve?
All of your examples seem to be for end-of-line strings. How does your proposal affect multiline strings?
How would you propose to handle the following cases:
key:␣␣␣value␣␣␣
key:␣␣␣"value"␣␣␣
Presumably you would strip the leading and trailing white space in both cases and end up with {'key': 'value'}
.
key:␣␣␣"value␣␣␣
key:␣␣␣value"␣␣␣
In these cases, would you get {'key': '"value'}
and {'key': 'value"'}
?
Human readable and significant leading/trailing whitespace in a string do not go together.
I find that statement too vague. Precisely what problem are you trying to solve?
The second example from the gotchas in the documentation: trailing spaces meant the value was a single space, causing an error when the intended value was read on the next line(s). The point being that a human looking at those lines cannot see the trailing spaces (and IDE support is not always available).
All of your examples seem to be for end-of-line strings. How does your proposal affect multiline strings?
The same: if there should be actual white space in a multiline string then it should be quoted:
address:
> line 1
>
> final line
would result in {'address': 'line 1\n\nfinal line'}
while
address:
> line 1
> " "
> final line
would result in {'address': 'line 1\n \nfinal line'}
.
How would you propose to handle the following cases:
key:␣␣␣value␣␣␣
key:␣␣␣"value"␣␣␣
Presumably you would strip the leading and trailing white space in both cases and end up with {'key': 'value'}.
Correct.
key:␣␣␣"value␣␣␣
key:␣␣␣value"␣␣␣
In these cases, would you get {'key': '"value'} and {'key': 'value"'}?
Yes.
Thinking through this some more, I would suggest that lack of a leading/trailing double-quote means:
If a leading and a trailing double-quote is found, then:
\t
for tab or \x20
for a space, etc.I think this would give us the best of both worlds -- simplicity of reading/writing for the majority of cases while allowing finer control when necessary.
So with the proposal, the single space after a -, : or > becomes at least one space, and every indentation must be quoted? So for example, in the current version we allow:
> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
> tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
> quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
> consequat.
> 1. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum
> dolore eu fugiat nulla pariatur.
> 2. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia
> deserunt mollit anim id est laborum.
But that now becomes:
> " Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod"
> tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
> quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
> consequat.
> 1. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum
> " dolore eu fugiat nulla pariatur."
> 2. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia
> " deserunt mollit anim id est laborum."
I am not crazy about that. Not only is it a pain to quote the indented lines, but it messes up the vertical alignment as you have an extra character at the front of some lines that will be deleted.
That's a good point. So the first space is trimmed, any subsequent unquoted leading space is retained -- after all, it can be seen.
One comment I'd like to make is that, in my own use cases, it's not uncommon to have values that are quoted (and that would have to be double quoted with the proposed syntax). To be specific, one of the things I used NestedText for is to store inputs for unit tests, and sometimes those inputs take the form of Python literals meant to be eval-ed inside the test case. String literals, of course, are quoted. It's also not too hard to imagine values representing some sort of dialog or quotation starting and ending with quotes. Since one of the central goals of this format is to avoid YAML-esque rules about when quotes have to be added or escaped, it makes me uneasy to use such a common syntax for delimiting strings.
An alternative syntax is to use a character (e.g. <) to mark the end of a string. The rule would basically be: strip all righmost whitespace characters. If the rightmost remaining character is '<', strip it as well. The hope is that this syntax would be less likely to conflict with real-world values. Some examples:
Typical use:
indent:␣␣␣␣␣<
# {'indent': '␣␣␣␣'}
Preserve trailing '<':
indent:␣␣␣␣␣<<
# {'indent': '␣␣␣␣<'}
I don't think it would be a good idea to apply any sort of per-line quoting to multiline strings. Multiline strings in general (e.g. in YAML, TOML, python, etc.) don't strip internal whitespace and don't support syntax to explicitly indicate where internal lines end. Quotes also have the potential to cause headaches for multiline strings containing prose. It's not uncommon for prose to contain a lot of trailing whitespace, because editors can use the presence of this whitespace to determine where paragraphs start and end (see :set formatoptions+=w
in vim). If such a paragraph were to be dumped to a NestedText file, every line would have to be quoted, which would add a lot of visual clutter and make the paragraph hard to edit.
I'm on the fence about the overall idea of adding quotes. Significant trailing whitespace is definitely a usability issue, but any solution will force users to keep in mind quoting/escaping rules that come up only rarely. It's not really clear to me which is worse.
Well, perhaps my "less preferred" suggestion is a better fit then: keep the trailing white space as the value, but if a nested value immediately shows up replace the white space value with the nested value.
treasurer:
name: Fumiko Purvis
address:␣␣
> 3636 Buffalo Ave
> Topika, Kansas 20692
address
becomes """3636 Buffalo Ave\nTopika, Kansas 20692"""
. (I don't know if NestedText
multiline strings include a trailing new-line, so I didn't add one in that example.)
I think the key question here is: do you expect people to write NestedText
files? Because if not, then the problem is unlikely to occur unless there is a bug in the library code that is doing the writing -- of course, there will still be the occasional problem of why what looks like an empty field evaluates as True
(an empty string is falsey, but a string with white space is truthy).
I don't understand why you think this particular example illustrates a serious issue. It should occur only rarely and the error message shows the problematic line with an EOL marker, so the problem is relatively easy to see from the error message.
The inconsistency is this:
Neither problem is terrible, but of the two, I would most prefer not to have the inconsistency. In the first case, there is a mistake in your input, you get an error message, you resolve it, and the problem is gone and you never have to think about it again. In the second case, the inconsistency is always there: it has to be documented, it raises questions, it has to be remembered or re-learned.
I am find the hidden white space at the end of the line problem more compelling, and Kale's proposal is a good one for end-of-line strings, but as Kale himself points out, it is problematic for multi-line strings. However, I'm not in favor of it for two reasons.
At this point I am thinking the language should remain as it is. However, I could try to improve the error message. For example, I could replace leading and trailing spaces in the value of the displayed lines with a ␣ symbols. In that way this particular error message becomes:
invalid indentation.
4 « address: ␣»
5 « > 3636 Buffalo Ave»
▲
Or perhaps I could distinguish this form of invalid indentation error from others and give a more specific message. Are these sufficient?
I think a specific error message would solve the problem -- maybe something like:
attempt to replace initial value of " " on line 4.
4 « address: ␣»
5 « > 3636 Buffalo Ave»
Okay, I have refined the error message.
error: test.nt, 6:
invalid indentation. An indent may only follow a dictionary or list
item that does not already have a value, which in this case consists
only of whitespace.
5 « address: »
6 « > 3636 Buffalo Ave»
▲
With that I believe this issue is closed. Thanks for your feedback.
Thank you for listening!
To this closed issue, I'll add my two cents.
The main problems for me here are that:
The best solution I can think of so far:
Add a new multiline string tag, which must always have a matching end-of-line tag. A single multiline string could use either tag for each line.
Let's say |
is the start tag and <
the end tag. Then the following YAML:
matches:
- trigger: ":ifm"
replace: "if __name__ == '__main__':\n "
- trigger: ":p3"
replace: "#!/usr/bin/env python3\n"
could be written as the following FutureNestedText:
matches:
-
trigger: :ifm
replace:
> if __name__ == '__main__':
| <
-
trigger: :p3
replace:
> #!/usr/bin/env python3
>
EDIT: It might be worth noting that this wouldn't do anything about the super edge case of trailing white space on a line of a multiline key. Though theoretically it could gain the same ability if you allow the |
& <
lines for those . . . or possibly with a :
instead of the <
:
: key 1
: the first key
| still the first key :
> value 1
: key 2: the second key
- value 2a
- value 2b
This suggestion seems to add significant complexity to address what is fundamentally an editor issue. Personally I configure my editor to show trailing white spaces but not to automatically delete them on the off chance that I want them. In my experience, the only time I want them is when entering long lines where I am using a trailing space to indicate that the line should be joined with the one below it.
Perhaps you can consider disabling the delete trailing white space feature of your editor when editing NestedText files.
Good advice, but I'll note it doesn't address normal cat-ing or paging in the terminal and not seeing the spaces -- nor the problem of other folks opening and saving the file with their editor settings, without realizing the changes they've made to lines they may not have even looked at.
Yes, I acknowledge that, but I believe significant trailing white space are unusual and the extra clarity that this change would bring would be worth the increase in complexity in NestedText.
I must say I'm a little confused now. Is the solution to keep significant trailing space?
Whoops, I dropped a word.
Yes, I acknowledge that, but I believe significant trailing white space is unusual and the extra clarity that this change would bring would not be worth the increase in complexity in NestedText.
Thanks for clarifying, but to me it means that NestedText suffers from a hidden flag that's waiting to bite. There's already so many. The implementation might be simpler, but users might suffer. I see it as transferring the responsibility of ensuring reliability from the implementation to the user. I don't need another easy-to-forget detail. The problem is not editors, or VCS, or diff tools configurations. Too bad. I'll probably stay away.
Subject says it all.