cloudbase / powershell-yaml

PowerShell CmdLets for YAML format manipulation
Apache License 2.0
429 stars 79 forks source link

Convetto-Yaml: Value starting with double quote is being enclosed by single quotes. #120

Closed EliorMachlev closed 2 months ago

EliorMachlev commented 10 months ago

I have a yaml file used by a popular software (NewRelic).

When converting (Convetto-Yaml) if the node value starts with a double quote (") it automatically adds starting and closing single quotes to the value.

Example:

Value: "Hey" After Conversion: '"Hey"'

Expected: have the option to keep values as is

EliorMachlev commented 10 months ago

I understand it have a valid reason for the addition, i belive it is related to this fix: https://github.com/cloudbase/powershell-yaml/issues/38

But we should have the option to "opt-out" from this.

EliorMachlev commented 10 months ago

Current Workaround for those who need it: Set a string before your starting double quote such as "TOBEREMOVED" and after conversion, using get-content replace it (-replace 'TOBEREMOVED','')

Or as a single-liner:

$YourDataVariable = 'TOBEREMOVED"Hey'
Set-Content -Path $YamlPath -Value ((ConvertTo-Yaml -Data $YourDataVariable  -Force) -replace 'TOBEREMOVED','')

Result: 'Hey

gabriel-samfira commented 10 months ago

Hi!

In powershell terms, the following is a literal string:

$myVal = '"Hi'

When converting to yaml, we need to enclose it in single quotes. It contains a single, double quote and if we don't, we'll generate an invalid yaml.

The following is also a literal string:

$myVal = '"hi"'

The quotes become part of the string. In this case, if we were not to enclose it is single quotes, we would have a valid yaml, but it would be incorrect. The variable in powershell contains a string which has quotes as part of the string. For example:

PS /home/gabriel> $a = '"Hei"'                             
PS /home/gabriel> $b = 'Hei'                                                                
PS /home/gabriel> $a -eq $b               
False

And converting that string to yaml would equate to:

PS /home/gabriel> $YourDataVariable = '"Hey"'       
PS /home/gabriel> ConvertTo-Yaml -Data $YourDataVariable 
'"Hey"'

Same thing happens in python as well:

>>> data = '"hi"'
>>> print(yaml.dump(data))
'"hi"'

If I use your example:

$YamlPath = "/tmp/test.yaml"
$YourDataVariable = 'TOBEREMOVED"Hey'
Set-Content -Path $YamlPath -Value ((ConvertTo-Yaml -Data $YourDataVariable  -Force) -replace 'TOBEREMOVED','')

I end up with a file containing an invalid yaml, which cannot be loaded in any other parser:

>>> a = open("/tmp/test.yaml")
>>> yaml.safe_load(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 35, in get_single_node
    if not self.check_event(StreamEndEvent):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
                         ^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/parser.py", line 142, in parse_implicit_document_start
    if not self.check_token(DirectiveToken, DocumentStartToken,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 116, in check_token
    self.fetch_more_tokens()
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 251, in fetch_more_tokens
    return self.fetch_double()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 655, in fetch_double
    self.fetch_flow_scalar(style='"')
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 666, in fetch_flow_scalar
    self.tokens.append(self.scan_flow_scalar(style))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 1151, in scan_flow_scalar
    chunks.extend(self.scan_flow_scalar_spaces(double, start_mark))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 1238, in scan_flow_scalar_spaces
    raise ScannerError("while scanning a quoted scalar", start_mark,
yaml.scanner.ScannerError: while scanning a quoted scalar
  in "/tmp/test.yaml", line 1, column 1
found unexpected end of stream
  in "/tmp/test.yaml", line 3, column 1

If I just do:

$YamlPath = "/tmp/test.yaml"
$YourDataVariable = '"Hey'
ConvertTo-Yaml -Data $YourDataVariable -Force  -OutFile $YamlPath

This results in a valid yaml:

gabriel@arrakis:~$ cat /tmp/test.yaml
'"Hey'

Which can be loaded in other parsers:

>>> a = open("/tmp/test.yaml")
>>> yaml.safe_load(a)
'"Hey'

But there is a chance that I have not understood the issue here. Would you mind adding a complete yaml sample and code you used to try to convert it?

EliorMachlev commented 10 months ago

@gabriel-samfira Hi, yes you understood the issue correctly. "Hey" and "Hey and 'Hey and 'Hey' all being wrapped around with extra quotes. Such as '"Hey"'.

I think we should have the option to opt out of validation and just save as-is. Basiclly, setting the responsibility on the user/developer.

Basically with NewRelic it works like that: '"Hey"' will be read as "Hey" (just like in Powershell, it reads it as literal string) "Hey" will be read as "Hey" (Again, just like in Powershell, the quotes will be removed and only the content will remain)

gabriel-samfira commented 10 months ago

I think we need to set some expectations.

If a string contains quotes as a part of that string, no parser would (or should) ever strip them away before serializing to yaml. This is important because if the quotes exist within the string, they may have a purpose that a parser cannot make assumptions against.

For example:

$literalQuotes = '"Hey"'
PS /home/gabriel> $literalQuotes.Length                     
5

Is a very different string than:

$justQuotes = "Hei"
PS /home/gabriel> $justQuotes.Length 
3

And is different from:

# This one can't be serialized to YAML without single quotes,
# otherwise, the generated yaml will be invalid and cannot be
# imported into any yaml parser.
$aSingleQuote = '"Hey'
$aSingleQuote.Length
4

A YAML parser will never strip those quotes away. If it does, than you probably shouldn't use that parser. The better approach would be, as you suggested, to leave the option of stripping away those quotes, to the author of the application. This can easily be done before you send the object to be serialized to yaml:

PS /home/gabriel> ConvertTo-Yaml $literalQuotes.TrimStart('"').TrimEnd('"')
Hey

But if you do this, you need to take into account that the generated YAML will contain a different value than the original string that you had. Your application, and whatever you integrate with, will need to accept the mutated data:

PS /home/gabriel> $imported = ConvertTo-Yaml $literalQuotes.TrimStart('"').TrimEnd('"') | ConvertFrom-yaml
PS /home/gabriel> $imported -eq $literalQuotes                                                            
False
PS /home/gabriel> $imported
Hey
PS /home/gabriel> $literalQuotes          
"Hey"

From a programming perspective, in the $literalQuotes example, the actual quote character " is no different than the alphanumeric characters.