cdevents / spec

A common specification for Continuous Delivery events
Apache License 2.0
125 stars 22 forks source link

remove confusing type `Enum or String` #215

Open davidB opened 2 months ago

davidB commented 2 months ago

With version 0.4, ticket's schema define type of some field (priority, resolution, ticketType) like:

"anyOf": [
  {
    "type": "string",
    "enum": [
      "low",
      "medium",
      "high"
    ]
  },
  {
    "type": "string"
  }
]

But this definition is "conflicting" as the second is a superset of the first. (if oneOf is used instead of anyOf a jsonschema linter will warm about this issue). I mean for any language implementation string will be used (or an automatic generator/serializer+deserializer will be confused).

What is the purpose/value of the enum if any value can be used? (eg if a system is uppercase like "LOW")

Suggestion: replace the anyOf by single"type": "string"

@sbrennan4 WDYT?

afrittoli commented 1 month ago

@davidB we added the anyOf(string, enum) on purpose so that we could suggest some known values in the JSON schema (and not only in the docs), but still accept others. We are aware that validation can in fact only validate that values are a string but we use this as a way to document well known values.

The initial approach was to only have an enum with other as one of the values, which was limiting because it would not let users carry the value they needed in their event. An alternative I considered was keeping other and adding another field, to be used only when other is used. With this option, a consumer does not know in advance which field contains the expected value, which is not ideal. Additionally this would be harder to validate.

davidB commented 1 month ago

If the purpose is to document, why not use the field description or examples (or a custom field x-...)?

https://json-schema.org/understanding-json-schema/reference/annotations

You should not only consider those schemas as a way to "validate" (in this case there is no validation), but also as definitions used to:

What it will look like in an API? A anyOf is often translated into an inheritance or an Union or a Enum. And parser will need a way to select the correct (sub)type. Eg for the value "low" in json is it a enum's member or a string? and same question for "LOW".

xibz commented 1 month ago

@davidB does this definition affect some tooling or something you are using? Im trying to understand what the exact problem is other than it can be more efficiently written.

And it isn't just documenting. It's specifically saying, this is a string, and here are the list of possible values. The latter point is very important. If we just say string, then that means anything, but we want to limit that.

If there is tooling that is being used that is having trouble with this definition, can we get some more information on that? That may help us figure out a way to approach the problem.

davidB commented 1 month ago

@xibz, Yes this definition affect tooling (currently I customize it to say String only). Try to answer the question:

With a tool like https://app.quicktype.io/, it becomes a simple String, but if I do word to word manual translation of this definition in rust it should be something like:

Enum Priority { // anyOf
  PriorityEnum(PriorityEnum),
  String(String),
}

Enum PriorityEnum {
  Low,
  Medium,
  High,
}

But on parsing, deserialization PriorityEnum will never be used, because it requires custom code to say if the string value in ... then use enum, else use string. (My issue is the need of this custom code for each type/field)

And for user it also complexity the usage.

json enum are for exhausted list.

xibz commented 1 month ago

After playing around some with the JSON schema, I believe anyOf should be oneOf. I noticed you did mention that oneOf would complain about the issue, but would still be fine. Would that work for you, @davidB?

davidB commented 1 month ago

oneOf is better IMHO in term of types and how to deserialize data. oneOf is like XOR, a value can be A or B but not both. I guess that a json validator will failed with the current definition because the enum and the string are not mutually exclusive (eg with a "low"). oneOf use discriminator or "duck typing" to check that there is no overlap between possible type.

IMO, oneOf highlights the issue. Using the enum to show some possible value is not the right approach.

To be clear what will work for me (and for tool and reader) is something like:

    "priority": {
      "description": "An indicator of the importance of the ticket",
      "examples": [
          "low",
          "medium",
          "high"
      ],    
      "type": "string"
    }
xibz commented 1 month ago

I think anyOf is still valid in this case, but I could see why it is confusing. I think the more confusing portion is here

"anyOf": [
  {
    "type": "string",
    "enum": [
      "low",
      "medium",
      "high"
    ]
  },
  { 
    "type": "string" <- This is not needed
  }
]

I will go ahead and mark this as a bug, and we can fix it to look like,


"oneOf": [
  {
    "type": "string",
    "enum": [
      "low",
      "medium",
      "high"
    ]
  }
]

Would that work for you? I think this will solve all the issues you are seeing at least with the tool

davidB commented 1 month ago

in this case why using a oneOf? (there is only one possible type the enum)

xibz commented 1 month ago

@davidB Because a string'd enum is different than an normal enum, e.g. C where it is an integer, and we also want to explicitly say it is a string, and not anything else. It's mostly for clarity.

davidB commented 1 month ago

in json-schema enum are string. What I mean is why

"oneOf": [
  {
    "type": "string",
    "enum": [
      "low",
      "medium",
      "high"
    ]
  }
]

and not

    "type": "string",
    "enum": [
      "low",
      "medium",
      "high"
    ]

Sorry I don't understand the value of oneOf

xibz commented 1 month ago

@davidB No, they don't have to be strings. We are explicitly saying that they are strings. This is important because anyone wanting to make a change to the spec, will know that it is explicitly as a string.

"oneOf": [
  {
    "enum": [
      1, <- this is valid
      "low",
      "medium",
      "high"
    ]
  }
]
davidB commented 1 month ago

My question is why using a "oneOf" with an array of 1 value (the enum)?

xibz commented 1 month ago

@davidB Oh, sorry, I misunderstood what you are asking. It's not an array in practice. That's just part of the spec.

Basically it is saying, we have some enum value, and it can be a, b, or c. The a, b, or c is just a list in the schema, but the list itself is not represented in code to be generated, etc.

I hope that makes sense?

davidB commented 1 month ago

oneOf and enum are unrelated.

xibz commented 1 month ago

Oh! I completely mistook what you were saying! Gotcha! hahah, now I see the confusion. Okay, yea, this is definitely not modeled correctly. Okay, I will have this fixed with

"enum": [
      "low",
      "medium",
      "high"
    ]
davidB commented 1 month ago

if a, b, c should not be represented in code, then enum is not the right json-schema medium. It's why I suggest examples. Also note that using examples will also help to be in sync in the md files (and maybe to generate part of them) and tool like schemathesis,...

xibz commented 1 month ago

Okay, I think I understand what was trying to be modeled. So basically we want any string, but want to provide some defaults, like low, medium, high in the SDKs. I think this is primarily used to generate some defaults in the SDK.

Just providing string is fine, and having it documented that the string is low, medium, or high, but that isn't as useful as providing that in an SDK.

afrittoli commented 1 month ago

oneOf is better IMHO in term of types and how to deserialize data. oneOf is like XOR, a value can be A or B but not both. I guess that a json validator will failed with the current definition because the enum and the string are not mutually exclusive (eg with a "low"). oneOf use discriminator or "duck typing" to check that there is no overlap between possible type.

IMO, oneOf highlights the issue. Using the enum to show some possible value is not the right approach.

To be clear what will work for me (and for tool and reader) is something like:

    "priority": {
      "description": "An indicator of the importance of the ticket",
      "examples": [
          "low",
          "medium",
          "high"
      ],    
      "type": "string"
    }

Interesting - is examples here a custom field? Does jsonschema allow for that? I would like the SDK to include several constants which correspond to the values in examples, so that SDK users can benefit from those or add their own value - would it be possible to achieve that with a custom schema field?

davidB commented 1 month ago

examples is part of json-schema draft 6, see https://json-schema.org/understanding-json-schema/reference/annotations

and used by some OpenAPI and contract-testing tool like schemathesis to validate API or generate samples or validate API

xibz commented 1 month ago

@davidB Examples are nice, but I think it's really important to have the values modeled in the SDKs. I believe examples, will not be used in code generation? So it may not be as useful

davidB commented 1 month ago

@xibz Can you provide an example (in a programming language) about how you think those values could be modeled in the SDKs?

xibz commented 1 month ago

@davidB

type Priority string

var Low Priority = "low"
var Medium Priority = "medium"
var High Priority = "high"

Something like this would be sufficient

Then if a user wanted a custom value they could do something like

fmt.Println(cdevents.Priority("my-custom-value"))
// or use a predefined value we've specified to help consistency
fmt.Println(cdevents.Low)
afrittoli commented 1 month ago

@xibz We discussed this during the working group today and came to this proposal, let us know what you think:

"priority": {
      "description": "An indicator of the importance of the ticket",
      "examples": [
          "low",
          "medium",
          "high"
      ],    
      "type": "string"
    }
type Priority string

var Low Priority = "low"
var Medium Priority = "medium"
var High Priority = "high"

FYI @e-backmark-ericsson @rjalander

xibz commented 1 month ago

@afrittoli I think that makes sense except for generating based on examples. Mostly that examples does not seem like the appropriate place to generate defaults from. However, if we want to change the semantics of examples then Im fine with it.

This isn't used for validation, but may help with explaining the effect and purpose of the schema to a reader

While we could generate from there, other enums from examples may cause issues if we do not plan to have them as generated. So it feels hacky.

Im personally fine with the current definition, as it is valid. However, if there is a more correct way to write it, I'd be for it. I don't know if examples is the correct way though. With that said, though, Im not too overly worried about it and I think if it unblocks @davidB, we can move forward with it in v0.5.