cloudbase / powershell-yaml

PowerShell CmdLets for YAML format manipulation
Apache License 2.0
434 stars 78 forks source link

Incorrectly read the version number #152

Open xrgzs opened 3 weeks ago

xrgzs commented 3 weeks ago

I try to read a YAML configuration file that defines a version number, and I'm getting unexpected results.

Simplified as follows:

PS D:\> "version: 3.10" | ConvertFrom-Yaml

Name                           Value
----                           -----
version                        3.1

PS D:\> ("version: 3.10" | ConvertFrom-Yaml).version
3.1

The type is System.Double. This makes my script recognize the wrong version number.

PS D:\> ('version: 3.10' | convertFrom-Yaml).version.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Double                                   System.ValueType

Then I try to convert a JSON string of the same data. It returns different results from ConvertFrom-Json.

PS D:\> '{"version": 3.10}' | ConvertFrom-Yaml

Name                           Value
----                           -----
version                        3.1

PS D:\> '{"version": 3.10}' | ConvertFrom-Json

version
-------
   3.10

This is my environment:

PS D:\> Get-Module -Name powershell-yaml

ModuleType Version    PreRelease Name                                ExportedCommands
---------- -------    ---------- ----                                ----------------
Script     0.4.7                 powershell-yaml                     {ConvertFrom-Yaml, ConvertTo-Yaml, cfy, cty}

PS D:\> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      7.4.6
PSEdition                      Core
GitCommitId                    7.4.6
OS                             Microsoft Windows 10.0.22631
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

I think it would be better to provide an swich to treat number as string.

Or can someone give me a solution? I don't want to use regex.😭

xrgzs commented 3 weeks ago

jq also has this issue when converting YAML to JSON. But reading is fine. 🤔

PS D:\> "version: 3.10" | yq .version
3.10
PS D:\> "version: 3.10" | yq -o yaml
version: 3.10
PS D:\> "version: 3.10" | yq -o json
{
  "version": 3.1
}
xrgzs commented 3 weeks ago

jq correctly converts YAML to XML.

PS D:\> "version: 3.10" | yq -o xml
<version>3.10</version>

PS D:\> [xml]("version: 3.10" | yq -o xml)

version
-------
3.10

In complex cases, the output cannot be converted to XML by PowerShell.

PS D:\> [xml](@"
>> version: 3.10
>> name: "Python 3.10"
>> "@| yq -o xml)
InvalidArgument: Cannot convert value "System.Object[]" to type "System.Xml.XmlDocument". Error: "This document already has a 'DocumentElement' node."

After manual completion, it works.

function ConvertFrom-Yaml {
    param (
        [parameter(Mandatory, ValueFromPipeline)]
        [string]
        $InputObject
    )
    $xml = $InputObject | yq -o xml
    $xml = [xml] "<data>$xml</data>"
    return $xml.data
}

$xml = @"
version: 3.10
name: "Python 3.10"
installer:
    amd64:
        url: "https://www.python.org/ftp/python/3.10.8/python-3.10.8-amd64.exe"
"@ | ConvertFrom-Yaml

PS D:\> $xml

version name        installer
------- ----        ---------
3.10    Python 3.10 installer

PS D:\>
PS D:\> $xml.installer.amd64

url
---
https://www.python.org/ftp/python/3.10.8/python-3.10.8-amd64.exe

However, this implement may introduce some problems with XML, like ConvertTo-Json doesn't work.

xrgzs commented 3 weeks ago

As yq supports format selection, it can be further modified:

PS D:\> yq | Select-String format

# yq tries to auto-detect the file format based off the extension, and defaults to YAML if it's unknown (or piping through ST
DIN)
# Use the '-p/--input-format' flag to specify a format type.
  -p, --input-format string           [auto|a|yaml|y|json|j|props|p|csv|c|tsv|t|xml|x|base64|uri|toml|lua|l] parse format for
 input. (default "auto")
  -o, --output-format string          [auto|a|yaml|y|json|j|props|p|csv|c|tsv|t|xml|x|base64|uri|toml|shell|s|lua|l] output f
ormat type. (default "auto")
  -V, --version                       Print version information and quit
Use "yq [command] --help" for more information about a command.

Now choose a format as middleware. Known that the YAML-JSON conversion doesn't work well, so don't use it.

flowchart TD
    A[Start] --> B[Receive YAML string from pipeline]
    B --> C[Convert YAML to XML using yq -- toString]
    C --> D[Convert XML to JSON using yq]
    D --> E[Convert JSON to PSCustomObject using ConvertFrom-Json]
    E --> F[Return the converted object]
function ConvertFrom-Yaml {
    param (
        [parameter(Mandatory, ValueFromPipeline)]
        [string]
        $InputObject
    )
    return $InputObject | yq -o xml | yq -p xml -o json | ConvertFrom-Json
}

In my payloads, both YAML-XML-JSON-PSCustomObject and YAML-LUA-JSON-PSCustomObject are ok.

gabriel-samfira commented 3 weeks ago

Is this a yaml that you craft or is it something you consume? If it's something you define, is it possible to add the !!str tag or just quote the value and see if that makes a difference?

"version: !!str 3.10" | ConvertFrom-Yaml

Or just quote the scalar:

'version: "3.10"' | ConvertFrom-Yaml

This happens because the bare scalar (unquoted) is converted to a float. This probably happens in most yaml parsers that automatically coerce types.

When serializing to yaml, parsers have a choice to use bare scalars (values without quotes) or quoted scalars. A bare scalar can be ambiguous in some cases, if there are no tags associated to hint at what the original type was.

An easy way to disambiguate strings is to just quote them. That's why in most parsers (python, go, powershell-yaml), when you convert a string that might be easily converted to any other type, it's quoted.

$aNumber = 100
$aString = "100"

ConvertTo-Yaml $aNumber
ConvertTo-Yaml $aString
xrgzs commented 3 weeks ago

Is this a yaml that you craft or is it something you consume? If it's something you define, is it possible to add the !!str tag or just quote the value and see if that makes a difference?

"version: !!str 3.10" | ConvertFrom-Yaml

Or just quote the scalar:

'version: "3.10"' | ConvertFrom-Yaml

This happens because the bare scalar (unquoted) is converted to a float. This probably happens in most yaml parsers that automatically coerce types.

When serializing to yaml, parsers have a choice to use bare scalars (values without quotes) or quoted scalars. A bare scalar can be ambiguous in some cases, if there are no tags associated to hint at what the original type was.

An easy way to disambiguate strings is to just quote them. That's why in most parsers (python, go, powershell-yaml), when you convert a string that might be easily converted to any other type, it's quoted.

$aNumber = 100
$aString = "100"

ConvertTo-Yaml $aNumber
ConvertTo-Yaml $aString

It must be something I consume. The above test content is just simplified. The original content is obtained through Invoke-RestMethod and is much more complex.

Also, thank you for the analysis.

gabriel-samfira commented 3 weeks ago

Okay. There are 2 potential solutions to this. A simple one that will break round-triping, or a more complicated one in the form of something like this:

The simple one implies adding a switch to disable type coercion and return all scalars as strings. Serializing the result back to yaml will have all scalars, quoted. Breaking round tripping.

gabriel-samfira commented 3 weeks ago

I wish yaml authors would use tags or at least quote ambiguous scalars. Dynamically typed languages always end up guessing the type. It's a damned if you do damned if you don't situation. If you don't coerce types, everyone needs to duplicate the coercion. If you do coerce types, ambiguous scalars can be misinterpreted.

The WiP branch above tries to add some modeling, but people don't seem to be too interested.