cloudbase / powershell-yaml

PowerShell CmdLets for YAML format manipulation
Apache License 2.0
422 stars 78 forks source link

Work with PSCustomObjects instead of hashtables #85

Open o-o00o-o opened 2 years ago

o-o00o-o commented 2 years ago

The standard ConvertFrom/To modules work with PSCustomObjects (with options to fallback to Hashtables if needed for some reason). It would be great to have parity with them.

marshal-nash commented 1 year ago

I would even add that ConvertFrom/To modules work with arrays instead of Lists, and that it would be grate to have that level of parity too.

simgar commented 1 year ago

Came her to say the same thing.

gabriel-samfira commented 1 year ago

Hi folks,

Changing the objects that get returned (by default) at this point, will probably break things for a lot of people (this module gets roughly 2 million downloads per month according to powershellgallery.com) . If this is to happen, we'll need some way to safely transition.

Will thin about a safe way to do this. I would like to return PSCustomObject as that would make it easier to enable round tripping.

gabriel-samfira commented 3 weeks ago

Hi folks,

I don't know if you still care about this, but I opened a PR here:

A brief history note on why the decision was made to use generic lists and hashtables. Bare with me, it's one of the few joys you have in your 40s (dad jokes and history notes).

The first iterations of the module used a really old version of YamlDotNet (this module has been around for 8 years) and my lack of time coupled with my limited C# knowledge, make serializing PScustomObjects an endeavor that I had no time to get right. Until a few commits ago, PSCustomObjects still recursed to infinity when passed to YamlDotNet to serialize.

In more recent versions of YamlDotNet, we have the option to add custom type converters, so that issue is solved. Adding this is no longer an issue, but given the fact that for the past 8 years we've had generic lists and hashtables, the defaults have remained the same.

You can override defaults by setting:

$PSDefaultParameterValues=@{
    "ConvertFrom-Yaml:AsHashtable"=$false
    "cfy:AsHashtable"=$false
}

anywhere in your script before calling the ConvertFrom-Yaml commandlet.

So if you're still interested in this functionality, give the above mentioned branch a go, and let me know if it works as expected.

iRon7 commented 3 weeks ago

Rather than enabling/disabling specific dictionary types as AsHashtable (and possibly array/list collections), I would considered to allow for any dictionary (and array/list) like type (see also: https://github.com/PowerShell/PowerShell/issues/19928#issuecomment-1789291394). Meaning something like:

$PSDefaultParameterValues=@{
    "ConvertFrom-Yaml:DictionaryType"="OrderedHashTable"
    "ConvertFrom-Yaml:ListType"="Array"
}

or like:

$PSDefaultParameterValues=@{
    "ConvertFrom-Yaml:DictionaryType"="PSCustomObject"
    "ConvertFrom-Yaml:ListType"="List[Object]"
}

A close "wishful thinking" example/workaround (based on Copy-ObjectGraph):

$Test = @'
- Test:
  - a: 1
    b: 2
  - c: 3
    d: 4
'@ | ConvertFrom-Yaml -Ordered | Copy-ObjectGraph -ListAs List[Object] -MapAs PSCustomObject

$Test

Test
----
{@{a=1; b=2}, @{c=3; d=4}}

$Test | ConvertTo-Expression -LanguageMode Full # https://github.com/iRon7/ObjectGraphTools/blob/main/Docs/ConvertTo-Expression.md
[PSCustomObject]@{
    Test = [System.Collections.Generic.List[System.Object]]@(
        [PSCustomObject]@{
            a = [int]1
            b = [int]2
        },
        [PSCustomObject]@{
            c = [int]3
            d = [int]4
        }
    )
}
gabriel-samfira commented 3 weeks ago

Hi @iRon7

That sounds like a specialized use case. The goal with the above PR is to attempt to align powershell-yaml to other ConvertTo/ConvertFrom commandlets out there. I am not sure that the ConvertFrom-Yaml commandlet should implement custom type conversions.

The Copy-ObjectGraph commandlet you linked to, looks like a great way to do any specialized conversion one might need and fits that use case perfectly.

iRon7 commented 3 weeks ago

Hello @gabriel-samfira,

I don't think this a specialized use case.

The goal with the above PR is to attempt to align powershell-yaml to other ConvertTo/ConvertFrom commandlets out there.

I agree with this general vision, but if you continue to base everything on a specific collection switches, it would likely mean that you eventually run into the same (unresolvable break change) issue as with the ConvertFrom-Json cmdlet: #19928 Case insensitive ConvertFrom-Json -AsHashtable.

PowerShell is case-insensitive by default but Yaml and Json are not:

$Yaml = '{"a":1,"A":2}' | ConvertFrom-Json -AsHashTable | ConvertTo-Yaml
$Yaml
a: 1
A: 2

Roundtrip:

$Yaml | ConvertFrom-Yaml

Name                           Value
----                           -----
a                              1
$Yaml | ConvertFrom-Yaml -Ordered

Name                           Value
----                           -----
A                              2

A Copy-ObjectGraph -MapAs OrderedHashTable conversion won't be of any help here, as the concerned entries are already gone/overwritten by the ConvertFrom-Yaml cmdlet. Anyways, just some food for thought, I leave it up to you which roadmap you chose.

gabriel-samfira commented 3 weeks ago

Interesting! Thanks for the example and context! Indeed this is something to fix properly.

gabriel-samfira commented 3 weeks ago

So in essence, using PSCustomObjects will break any YAML or JSON with case sensitive keys. Not sure it's even a good option in this case.

That being said, I think I will abandon the PSCustomObject PR, fix the way hashtables are instantiated so we don't lose keys along the way (thanks for that!) and look into possibly allowing custom types for casting mappings and sequences, later.

I have a feeling that won't be a quick fix, and my time working on this (this round at least) is almost up.

gabriel-samfira commented 3 weeks ago

The PR for the case sensitivity thing: https://github.com/cloudbase/powershell-yaml/pull/139

iRon7 commented 3 weeks ago

using PSCustomObjects will break any YAML or JSON with case sensitive keys

Correct, in fact the current ConvertFrom-Yaml doesn't work as expected for this specific example (regardless using PSCustomObject types).

The catch22 problem with a "fix the way hashtables are instantiated" (e.g. similar to how it is implemented in ConvertFrom-Json -which cheats with returning an extended OrderedHashTable type instead-) is that this will cause a break change for users that expect a PowerShell-like-dictionary where they could just overwrite case-insensitive keys using any case (#19928). Besides, the OrderedHashTable type only exists in recent PowerShell versions (which would be an issue for an external cmdlet which is likely expected to also work with older versions of PowerShell as apposed to a native ConvertFrom-Json cmdlet). Hence my suggestion to let the user fully control the output type (PowerShell-like, Yaml-like or somewhere in the middle).

gabriel-samfira commented 3 weeks ago

You are absolutely right, and I think the long term goal would be to allow the user to specify the map and sequence types they want to deserialize into. I think that potential future versions of this module should allow the user to specify a type as a target for the entire Yaml to deserialize into (Classes are supported in powershell 5.1 from what I can tell), allowing users to create more complex models.

But for now we need to consider a few aspects:

I am not sure how wide spread the expectation to override case insensitive keys are. At least via a yaml or a JSON which may have 2 distinct keys, each with a different type but with the key name in a different case (would be a bad way do do things, but it's currently possible....and if it's possible, someone is probably doing it).

Other parsers seem to be case sensitive as well:

gabriel@rossak:~$ python3
Python 3.12.3 (main, Jul 31 2024, 17:43:48) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> yaml.safe_load('{"A": 1, "a":2}')
{'A': 1, 'a': 2}
>>> 

The fact that powershell does a case insensitive set of keys in powershell hashtables and we were overwriting keys was an accident. I never thought to check. I should have, considering the maze of spaghetti that Windows is when it comes to cases and filesystem path delimiters.

So for this version, I think we should just make sure that:

$yaml = '{"A": 1, "a": 2}'
$roundTrip = cfy -Ordered $yaml | cty -JsonCompatible

$roundTrip.TrimEnd() -eq $yaml.TrimEnd()

Future versions will add customization in terms of which types should be used for mappings ans sequences and potentially add the ability to deserialize an entire yaml/json to a custom type like a class.

What do you think?

gabriel-samfira commented 2 weeks ago

What do you folks think about something like this: