DSC Resource manifest schema should allow for setting properties as "keys"

JohnMcPMS commented 11 months ago

Summary of the new feature / enhancement

While we can use the schema to define the properties and their individual schemas, there is no way to specify that a particular property is a "key". This is distinct from required/optional. The set of all "key" property values given to a resource defines the identity of the data under management. Thus, passing the same set of key values, while varying other values, will result in a potential overwrite of the managed data.

The goal is to be able take two sets of properties that are intended for the same resource and determine if they are in conflict. As an example:

<config1 contains>
- name: A
  type: my/packages
  properties:
    id: foo
    version: [1.0-2.0)

<config2 contains>
- name: B
  type: my/packages
  properties:
    id: foo
    version: [3.0-]

With the resource my/packages having a property key set containing id, we can determine that config1:A and config2:B are potentially in conflict (they have matching key values and differing non-key values). This can be reported to the user so that they may determine how to proceed.

Proposed technical implementation details (optional)

No response

SteveL-MSFT commented 11 months ago

This is a valid point. Not sure how we want to expose this while maintaining ARM syntax compatibility. Would it make sense to have a _keys meta-property that is an array of key property names?

michaeltlombardi commented 11 months ago

This should be part of the JSON Schema that describes a resources' instances. We could define it as part of a JSON Schema dialect as an annotation keyword, like keyProperty:

type: my/package
schema:
  embedded:
    type: object
    properties:
      id:
        title:       Package ID
        description: The unique ID for the package.
        type:        string
        keyProperty: true
      version:
        title:       Package Version
        description: The exact version or version range for the package.
        type:        string

Annotation keywords are (relatively) easy to add to a JSON Schema dialect, and can be safely ignored by tools that don't need to care about them (that is, they don't change the processing of the instance itself, like the properties applicator keyword, they provide information about the instance).

I've been thinking about the dialect problem for a while and I think it more-likely-than-not that we'll eventually need a dialect for DSC Resource instance schemas - you need dialects for authoring meta schemas if you want to support authoring/processing for those schemas beyond the built-in keywords.

Aside on the value of a schema dialect for instance schemas

A schema dialect can also really improve the authoring experience by providing shorthand for otherwise complex requirements. For example, instead of defining an ensurable resource that uses the well-known `_ensure` property like this: ```yaml schema: embedded: type: object properties: _ensure: $ref: https://raw.githubusercontent.com/PowerShell/DSC/main/schemas/2023/08/resource/properties/ensure.json ``` You could do this: ```yaml schema: embedded: type: object ensurable: true ``` That's an applicator keyword rather than an annotation keyword, but extremely useful.

SteveL-MSFT commented 9 months ago

From this blog post (didn't find anything newer): https://json-schema.org/blog/posts/custom-annotations-will-continue. It seems that the right way is using a x- prefix for custom annotations. So we could go with x-keyProperty.

SteveL-MSFT commented 1 week ago

Thinking about this today the problem with x-keyProperty is the admin can't tell from reading a config what are keys without understanding the schema. An alternative proposal is to have a _keys property which contains key/value pairs that define key properties for the resource. This makes it easy to read a config to understand and also allow the engine to validate there aren't duplicated keys. @michaeltlombardi @mgreenegit

michaeltlombardi commented 1 week ago

If I understand correctly, @SteveL-MSFT, instead of using an extended vocabulary in the resource schema, we would add a new property to the resource instance object - but I'm not sure I see how that translates to the resource manifest advertising to users and integrating tools which properties are key properties - unless the idea is that the config author defines which properties they don't want to conflict?

I think enabling users to extend which properties to treat as keys could have some use cases, but without resources declaring which properties must not conflict in a given configuration, we shift the responsibility for figuring out and declaring key properties from the resource to the user and break the integration model for several higher order tools.

As for indicating which properties in the configuration are keys for resources, I think that might be better handled by the VS Code extension rather than altering the schema for how you define resource instances.

SteveL-MSFT commented 1 week ago

@michaeltlombardi v3 resources would explicitly have a property called _keys and define their keys as subproperties of this object. Expectation is that conflict only exists if ALL keys match the same values so the engine can validate that across instances of the same resource.

Here's a hypothetical example yaml:

$schema: https://raw.githubusercontent.com/PowerShell/DSC/main/schemas/2024/04/config/document.json
resources:
- name: registry example
  type: Microsoft.Windows/Registry
  properties:
    _keys:
      keyPath: HKCU\Test\Example
      valueName: myValueName
    valueData:
      string: hello

michaeltlombardi commented 1 week ago

That would make the proposed resource schema something like...

$schema:   https://json-schema.org/draft/2020-12/schema
type:      object
required: [_keys]
properties:
 _keys:
   type: object
   required: [keyPath]
   properties:
     keyPath:   { type: string }
     valueName: { type: [string, null] }
 _exist: { $ref: https://raw.githubusercontent.com/PowerShell/DSC/main/schemas/2024/04/resource/properties/exist.json }
 _metadata: { $ref: https://raw.githubusercontent.com/PowerShell/DSC/main/schemas/2024/04/resource/properties/metadata.json }
 valueData:
   type: object
   minProperties: 1
   maxProperties: 1
   properties:
     string:       { type: string }
     expandString: { type: string }
     binary:       { type: array, items: { type: integer } }
     dWord:        { type: integer }
     multiString:  { type: array, items: { type: string } }
     qWord:        { type: integer }

That could be fairly succinct (and is addressable in the manifest schema, we can require the properties definition for embedded schemas to define the _keys property as an object) - I'm not wholly opposed to it, but it is a large breaking change for prior resource design compared to using an annotation keyword.

PowerShell / DSC

DSC Resource manifest schema should allow for setting properties as "keys" #142

Summary of the new feature / enhancement

Proposed technical implementation details (optional)