Open xrgzs opened 3 weeks ago
jq
also has this issue when converting YAML to JSON. But reading is fine. 🤔
PS D:\> "version: 3.10" | yq .version
3.10
PS D:\> "version: 3.10" | yq -o yaml
version: 3.10
PS D:\> "version: 3.10" | yq -o json
{
"version": 3.1
}
jq
correctly converts YAML to XML.
PS D:\> "version: 3.10" | yq -o xml
<version>3.10</version>
PS D:\> [xml]("version: 3.10" | yq -o xml)
version
-------
3.10
In complex cases, the output cannot be converted to XML by PowerShell.
PS D:\> [xml](@"
>> version: 3.10
>> name: "Python 3.10"
>> "@| yq -o xml)
InvalidArgument: Cannot convert value "System.Object[]" to type "System.Xml.XmlDocument". Error: "This document already has a 'DocumentElement' node."
After manual completion, it works.
function ConvertFrom-Yaml {
param (
[parameter(Mandatory, ValueFromPipeline)]
[string]
$InputObject
)
$xml = $InputObject | yq -o xml
$xml = [xml] "<data>$xml</data>"
return $xml.data
}
$xml = @"
version: 3.10
name: "Python 3.10"
installer:
amd64:
url: "https://www.python.org/ftp/python/3.10.8/python-3.10.8-amd64.exe"
"@ | ConvertFrom-Yaml
PS D:\> $xml
version name installer
------- ---- ---------
3.10 Python 3.10 installer
PS D:\>
PS D:\> $xml.installer.amd64
url
---
https://www.python.org/ftp/python/3.10.8/python-3.10.8-amd64.exe
However, this implement may introduce some problems with XML, like ConvertTo-Json
doesn't work.
As yq
supports format selection, it can be further modified:
PS D:\> yq | Select-String format
# yq tries to auto-detect the file format based off the extension, and defaults to YAML if it's unknown (or piping through ST
DIN)
# Use the '-p/--input-format' flag to specify a format type.
-p, --input-format string [auto|a|yaml|y|json|j|props|p|csv|c|tsv|t|xml|x|base64|uri|toml|lua|l] parse format for
input. (default "auto")
-o, --output-format string [auto|a|yaml|y|json|j|props|p|csv|c|tsv|t|xml|x|base64|uri|toml|shell|s|lua|l] output f
ormat type. (default "auto")
-V, --version Print version information and quit
Use "yq [command] --help" for more information about a command.
Now choose a format as middleware. Known that the YAML-JSON conversion doesn't work well, so don't use it.
flowchart TD
A[Start] --> B[Receive YAML string from pipeline]
B --> C[Convert YAML to XML using yq -- toString]
C --> D[Convert XML to JSON using yq]
D --> E[Convert JSON to PSCustomObject using ConvertFrom-Json]
E --> F[Return the converted object]
function ConvertFrom-Yaml {
param (
[parameter(Mandatory, ValueFromPipeline)]
[string]
$InputObject
)
return $InputObject | yq -o xml | yq -p xml -o json | ConvertFrom-Json
}
In my payloads, both YAML-XML-JSON-PSCustomObject and YAML-LUA-JSON-PSCustomObject are ok.
Is this a yaml that you craft or is it something you consume? If it's something you define, is it possible to add the !!str
tag or just quote the value and see if that makes a difference?
"version: !!str 3.10" | ConvertFrom-Yaml
Or just quote the scalar:
'version: "3.10"' | ConvertFrom-Yaml
This happens because the bare scalar (unquoted) is converted to a float. This probably happens in most yaml parsers that automatically coerce types.
When serializing to yaml, parsers have a choice to use bare scalars (values without quotes) or quoted scalars. A bare scalar can be ambiguous in some cases, if there are no tags associated to hint at what the original type was.
An easy way to disambiguate strings is to just quote them. That's why in most parsers (python, go, powershell-yaml), when you convert a string that might be easily converted to any other type, it's quoted.
$aNumber = 100
$aString = "100"
ConvertTo-Yaml $aNumber
ConvertTo-Yaml $aString
Is this a yaml that you craft or is it something you consume? If it's something you define, is it possible to add the
!!str
tag or just quote the value and see if that makes a difference?"version: !!str 3.10" | ConvertFrom-Yaml
Or just quote the scalar:
'version: "3.10"' | ConvertFrom-Yaml
This happens because the bare scalar (unquoted) is converted to a float. This probably happens in most yaml parsers that automatically coerce types.
When serializing to yaml, parsers have a choice to use bare scalars (values without quotes) or quoted scalars. A bare scalar can be ambiguous in some cases, if there are no tags associated to hint at what the original type was.
An easy way to disambiguate strings is to just quote them. That's why in most parsers (python, go, powershell-yaml), when you convert a string that might be easily converted to any other type, it's quoted.
$aNumber = 100 $aString = "100" ConvertTo-Yaml $aNumber ConvertTo-Yaml $aString
It must be something I consume. The above test content is just simplified. The original content is obtained through Invoke-RestMethod
and is much more complex.
Also, thank you for the analysis.
Okay. There are 2 potential solutions to this. A simple one that will break round-triping, or a more complicated one in the form of something like this:
The simple one implies adding a switch to disable type coercion and return all scalars as strings. Serializing the result back to yaml will have all scalars, quoted. Breaking round tripping.
I wish yaml authors would use tags or at least quote ambiguous scalars. Dynamically typed languages always end up guessing the type. It's a damned if you do damned if you don't situation. If you don't coerce types, everyone needs to duplicate the coercion. If you do coerce types, ambiguous scalars can be misinterpreted.
The WiP branch above tries to add some modeling, but people don't seem to be too interested.
I try to read a YAML configuration file that defines a version number, and I'm getting unexpected results.
Simplified as follows:
The type is
System.Double
. This makes my script recognize the wrong version number.Then I try to convert a JSON string of the same data. It returns different results from
ConvertFrom-Json
.This is my environment:
I think it would be better to provide an swich to treat
number
asstring
.Or can someone give me a solution? I don't want to use regex.😭