Phil-Factor / PSYaml

A Powershell module to convert YAML documents to and from PowerShell objects
MIT License
91 stars 32 forks source link

Strings containing numbers or dates are converted to numbers and dates #20

Open mrled opened 6 years ago

mrled commented 6 years ago

Summary

When converting a YAML document containing strings that could represent numbers or dates, ConvertFrom-YAML will always convert them to numbers or dates.

Environment

Steps to reproduce

  1. Create a valid YAML document
  2. Add a string value which contains a number or a date
  3. Convert the document via ConvertFrom-Yaml

Actual result

String values that look like numbers are dates are converted to number or date objects:

> $fromYaml = ConvertFrom-Yaml -YamlString @"
one: thing
two: "2"
three: "2014-04-01"
"@

> $fromYaml.two; $fromYaml.two.GetType().Name
2
Int32

> $fromYaml.three; $fromYaml.three.GetType().Name
Tuesday, April 1, 2014 12:00:00 AM
DateTime

Expected result

I expected that, just like with the native ConvertFrom-Json cmdlet, string values are preserved as strings, rather than converted to numbers or dates, as long as the values are wrapped in double quotes.

Here is similar output for the ConvertFrom-Json cmdlet, showing that strings are preserved:

> $fromJson = ConvertFrom-Json -InputObject '{"one": "thing", "two": "2", "three": "2014-04-01"}'

> $fromJson.two; $fromJson.two.GetType().Name
2
String

> $fromJson.three; $fromJson.three.GetType().Name
2014-04-01
String

For what it's worth, other YAML converters had my expected output, where strings are preserved. For instance, here is the PyYAML module for Python converting a YAML string into a Python dict:

>>> import yaml

>>> from_yaml = yaml.load("""
... one: thing
... two: "2"
... three: "2014-04-01"
... """)

>>> from_yaml['two']; type(from_yaml['two'])
'2'
<class 'str'>

>>> from_yaml['three']; type(from_yaml['three'])
'2014-04-01'
<class 'str'>

Workaround

Happily, there is a workaround for anyone else who might run into this. PSYaml supports YAML tags, as documented in the readme. If you have control over the input YAML document, you can prepend the !!str tag to the value that should remain a string, like so:

> $fromTaggedYaml = ConvertFrom-Yaml -YamlString @"
one: thing
two: !!str "2"
three: !!str "2014-04-01"
"@

> $fromTaggedYaml.two; $fromTaggedYaml.two.GetType().Name
2
String

> $fromTaggedYaml.three; $fromTaggedYaml.three.GetType().Name
2014-04-01
String
nblpl1 commented 5 years ago

+1

This drove me crazy. It also applies to things which aren't dates or numbers but could somehow be interpreted that way. Such as the version number '12.1.0' or the string 'off'.

> ConvertFrom-Yaml -YamlString "'off'" -Verbose
VERBOSE: YamlDocument = YamlDotNet.RepresentationModel.YamlDocument
VERBOSE: Tag=, Style=, Anchor=
VERBOSE: YamlScalarNode = off
VERBOSE: Tag=, Style=SingleQuoted, Anchor=
False

> ConvertFrom-Yaml -YamlString "'12.1.0'" -Verbose
VERBOSE: YamlDocument = YamlDotNet.RepresentationModel.YamlDocument
VERBOSE: Tag=, Style=, Anchor=
VERBOSE: YamlScalarNode = 12.1.0
VERBOSE: Tag=, Style=SingleQuoted, Anchor=

Friday, December 1, 2000 12:00:00 AM

ConvertFrom-Yaml really shouldn't force conversion to other types when it knows the style is single quotes.

@mrled Thanks for the tip I will give this a try.

Phil-Factor commented 5 years ago

I wonder if ConvertTo-YAML should have a strict mode that includes tags? As far as I'm aware, if YAML doesn't include a tag, it has to guess the type in a certain order. http://blogs.perl.org/users/tinita/2018/01/introduction-to-yaml-schemas-and-tags.html

eddycharly commented 5 years ago

Any fix planned ? Or workaround ?

Phil-Factor commented 5 years ago

YAML requires the use of tags if there is any ambiguity in a value. This issue is over what happens if you don't use tags, in terms of the precedence given to the conversion of the scalar value to a NET datatype. My quandary is that I have to follow the YAML standard rather than 'best practice'. As far as I can see, the parser does not give the type of string delimiter any meaning. You can, after all, leave delimiters out altogether. The assignment of scalars to values in PowerShell is done in PSYaml/PSYaml/Private/ConvertFrom-YAMLDocument.ps1 (Not my categorization choice). You can alter this to whatever suits your application. I'm more than happy to change the version here if you can please point me to the rule in the YAML 1.2 standard for the conversion of untagged ambiguous values. I need a rule to implement and test against. I agree that it must be irritating for people to have ip addresses interpreted as dates and so on, but the tags exist in the standard to sort out precisely this problem. I haven't closed this because I still don't know if it is a bug in my code or a bug in the YAML standard for 1.2.

frippe75 commented 5 years ago

Got stuck on this as well. Need to pass a time of day in a 24h format

Start: "00:00"

This got converted to a date and I had to use the !!str Start: !!str "12:00"

Could it be possible to allow a global preference variable to control this behaviour? [PSYaml.Preference]::StringsAsStrings = $True

And simply document it so that people can continue using the fabulous library? I use it alot!

dlwyatt commented 3 years ago

@Phil-Factor I think you're looking for section 10.2.2 (Tag Resolution) of https://yaml.org/spec/1.2/spec.html . It states that those automatic conversions to number / bool / etc only happen for "plain scalars" (which are defined earlier in the spec as things that are unquoted.)

However, YAML 1.2 doesn't convert things like yes/no and on/off to boolean anymore (which is how I wound up here), so you may be really adhering to YAML 1.1 instead. In the 1.1 spec, you want the beginning of chapter 9, which states: YAML provides a rich set of scalar styles to choose from, depending upon the readability requirements: three scalar flow styles (the plain style and the two quoted styles: single-quoted and double-quoted), and two scalar block styles (the literal style and the folded style). Comments may precede or follow scalar content, but must not appear inside it. Scalar node style is a presentation detail and must not be used to convey content information, with the exception that untagged plain scalars are resolved in a distinct way..

Either way, the conversion applies to "plain scalars", meaning unquoted.

dlwyatt commented 3 years ago

Assuming you agree with the spec comment I made, the fix seems pretty straightforward: https://github.com/Phil-Factor/PSYaml/blob/master/PSYaml/Private/ConvertFrom-YAMLDocument.ps1#L67 should become if (! $tag && $Style -eq 'Plain')