CRDGenerator: Add support for generic schema definition with @Schema annotation

baloo42 commented 7 months ago

Is your enhancement related to a problem? Please describe

At the moment the CRDGenerator does not support some schema properties like format, example, externalDocs (see #5859).

Describe the solution you'd like

Introduce @Schema annotation which allows to define such leftovers:

package io.fabric8.crd.generator.annotation;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

/**
 * Java representation of JSONSchemaProps.
 *
 * <p>The annotation may be used to define a Schema for a set of elements of the OpenAPI spec, and/or
 * to define additional properties for the schema. The annotation has the highest precedence,
 * which means it always wins if the class is annotated with other annotations.</p>
 */
@Target({ ElementType.ANNOTATION_TYPE, ElementType.FIELD, ElementType.METHOD })
@Retention(RetentionPolicy.RUNTIME)
public @interface Schema {

  /**
   * The name of the schema or property.
   *
   * @return the name of the schema
   **/
  String name() default "";

  /**
   * A title to explain the purpose of the schema.
   *
   * @return the title of the schema
   **/
  String title() default "";

  /**
   * A description of the schema.
   *
   * @return the schema's description
   **/
  String description() default "";

  /**
   * Constrains a value such that when divided by the multipleOf, the remainder must be an integer.
   * Ignored if the value is 0.
   *
   * @return the multiplier constraint of the schema
   **/
  double multipleOf() default 0;

  /**
   * Provides an optional override for the format.
   * <p>The following formats are validated:</p>
   * <ul>
   * <li>{@code bsonobjectid}: a bson object ID, i.e. a 24 characters hex string</li>
   * <li>{@code uri}: an URI as parsed by Golang net/url.ParseRequestURI</li>
   * <li>{@code email}: an email address as parsed by Golang net/mail.ParseAddress</li>
   * <li>{@code hostname}: a valid representation for an Internet host name, as defined by RFC 1034, section 3.1 [RFC1034].</li>
   * <li>{@code ipv4}: an IPv4 IP as parsed by Golang net.ParseIP</li>
   * <li>{@code ipv6}: an IPv6 IP as parsed by Golang net.ParseIP</li>
   * <li>{@code cidr}: a CIDR as parsed by Golang net.ParseCIDR</li>
   * <li>{@code mac}: a MAC address as parsed by Golang net.ParseMAC</li>
   * <li>{@code uuid}: an UUID that allows uppercase defined by the regex
   * (?i)^[0-9a-f]{8}-?[0-9a-f]{4}-?[0-9a-f]{4}-?[0-9a-f]{4}-?[0-9a-f]{12}$</li>
   * <li>{@code uuid3}: an UUID3 that allows uppercase defined by the regex
   * (?i)^[0-9a-f]{8}-?[0-9a-f]{4}-?3[0-9a-f]{3}-?[0-9a-f]{4}-?[0-9a-f]{12}$</li>
   * <li>{@code uuid4}: an UUID4 that allows uppercase defined by the regex
   * (?i)^[0-9a-f]{8}-?[0-9a-f]{4}-?4[0-9a-f]{3}-?[89ab][0-9a-f]{3}-?[0-9a-f]{12}$</li>
   * <li>{@code uuid5}: an UUID5 that allows uppercase defined by the regex
   * (?i)^[0-9a-f]{8}-?[0-9a-f]{4}-?5[0-9a-f]{3}-?[89ab][0-9a-f]{3}-?[0-9a-f]{12}$</li>
   * <li>{@code isbn}: an ISBN10 or ISBN13 number string like "0321751043" or "978-0321751041"</li>
   * <li>{@code isbn10}: an ISBN10 number string like "0321751043"</li>
   * <li>{@code isbn13}: an ISBN13 number string like "978-0321751041"</li>
   * <li>{@code creditcard}: a credit card number defined by the regex
   * ^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$
   * with any non digit characters mixed in</li>
   * <li>{@code ssn}: a U.S. social security number following the regex ^\d{3}[- ]?\d{2}[- ]?\d{4}$</li>
   * <li>{@code hexcolor}: an hexadecimal color code like "#FFFFFF: following the regex ^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$</li>
   * <li>{@code rgbcolor}: an RGB color code like rgb like "rgb(255,255,2559"</li>
   * <li>{@code byte}: base64 encoded binary data</li>
   * <li>{@code password}: any kind of string</li>
   * <li>{@code date}: a date string like "2006-01-02" as defined by full-date in RFC3339</li>
   * <li>{@code duration}: a duration string like "22 ns" as parsed by Golang time.ParseDuration or compatible with Scala duration
   * format</li>
   * <li>{@code date-time}: a date time string like "2014-12-15T19:30:20.000Z" as defined by date-time in RFC3339.</li>
   * </ul>
   * <p>
   * Unknown formats are ignored by Kubernetes and if another consumer is unaware of the meaning of the format,
   * they shall fall back to using the basic type without format.
   * </p>
   *
   * @return the schema's format
   **/
  String format() default "";

  /**
   * Provides an example of the schema.  When associated with a specific media type, the example string shall be parsed by the consumer to be treated as an object or an array.
   *
   * @return an example of this schema
   **/
  String example() default "";

  /**
   * Additional external documentation for this schema.
   *
   * @return additional schema documentation
   **/
  ExternalDocumentation externalDocs() default @ExternalDocumentation();

  /**
   * Annotates an array/list to further describe its topology.
   * <p>
   * Emits {@code x-kubernetes-list-type}
   * </p>
   */
  ListType listType() default ListType.ATOMIC;

  /**
   * Annotates an array/list to further describe its topology.
   * <p>
   * Emits {@code x-kubernetes-list-map-keys}
   * </p>
   *
   * <p>x-kubernetes-list-map-keys annotates an array with the x-kubernetes-list-type map by specifying
   * the keys used as the index of the map.</p>
   *
   * <p>This tag MUST only be used on lists that have the “x-kubernetes-list-type” extension set to “map”.
   * Also, the values specified for this attribute must be a scalar typed field
   * of the child structure (no nesting is supported).</p>
   *
   * <p>The properties specified must either be required or have a default value, to ensure
   * those properties are present for all list items.</p>
   */
  String[] listMapKeys() default "";

  /**
   * Annotates an object to further describe its topology.
   * <p>
   * Emits {@code x-kubernetes-map-type}
   * </p>
   */
  MapType mapType() default MapType.ATOMIC;

  /**
   * Annotates an object that the value is an embedded Kubernetes runtime.Object, with TypeMeta and ObjectMeta.
   * <p>
   * Emits {@code x-kubernetes-embedded-resource}
   * </p>
   * <p>
   * The type must be object. It is allowed to further restrict the embedded object.
   * kind, apiVersion and metadata are validated automatically.
   * x-kubernetes-preserve-unknown-fields is allowed to be true, but does not have to be
   * if the object is fully specified (up to kind, apiVersion, metadata).
   * </p>
   */
  boolean embeddedResource() default false;

  enum ListType {
    /**
     * The list is treated as a single entity, like a scalar.
     * Atomic lists will be entirely replaced when updated.
     * This extension may be used on any type of list (struct, scalar, ...).
     */
    ATOMIC,
    /**
     * Sets are lists that must not have multiple items with the same value.
     * Each value must be a scalar, an object with x-kubernetes-map-type atomic
     * or an array with x-kubernetes-list-type atomic.
     */
    SET,
    /**
     * These lists are like maps in that their elements have a non-index key used to identify them.
     * Order is preserved upon merge. The map tag must only be used on a list with elements of type object.
     */
    MAP
  }

  enum MapType {
    /**
     * These maps are actual maps (key-value pairs) and each field is independent
     * of each other (they can each be manipulated by separate actors). This is
     * the default behaviour for all maps.
     */
    GRANULAR,

    /**
     * The map is treated as a single entity, like a scalar.
     * Atomic maps will be entirely replaced when updated.
     */
    ATOMIC
  }

}

@Target({ElementType.METHOD, ElementType.TYPE, ElementType.ANNOTATION_TYPE})
@Retention(RetentionPolicy.RUNTIME)
@Inherited
public @interface ExternalDocumentation {

    /**
     * A short description of the target documentation.
     *
     * @return the documentation description
     **/
    String description() default "";

    /**
     * The URL for the target documentation. Value must be in the format of a URL.
     *
     * @return the documentation URL
     **/
    String url() default "";

}

The @Schema annotation always wins if there are other annotations defined, which would result in the same property.

For example:

public class Test {
  @JsonPropertyDescription("This description is omitted")
  @Schema(description = "This description wins")
  private String myValue;
}

Describe alternatives you've considered

This approach avoids the explosion of annotations in the first place. But additional annotations can be easily added in the future as it would not be a breaking change.

For example, the @ListType could be added later:

/**
 * Annotates an array/list to further describe its topology.
 * <p>
 * Emits {@code x-kubernetes-list-type}
 * </p>
 */
public @interface ListType {

  Type value() default Type.atomic;

  enum Type {
    /**
     * The list is treated as a single entity, like a scalar.
     * Atomic lists will be entirely replaced when updated.
     * This extension may be used on any type of list (struct, scalar, ...).
     */
    atomic,
    /**
     * Sets are lists that must not have multiple items with the same value.
     * Each value must be a scalar, an object with x-kubernetes-map-type atomic
     * or an array with x-kubernetes-list-type atomic.
     */
    set,
    /**
     * These lists are like maps in that their elements have a non-index key used to identify them.
     * Order is preserved upon merge. The map tag must only be used on a list with elements of type object.
     */
    map
  }
}

Additional context

andreaTP commented 7 months ago

@baloo42 thanks as always for those issues, keep them coming!

I like the proposal overall! A few comments:

non-semantic fields (e.g. example, externalDocs etc.) I think is a no-brainer to go ahead with this or a similar proposal
fields involving validation (e.g. multipleOf) I would like to understand how this will play with #5859 and how we can integrate in the java-generator too
types (e.g. x-kubernetes-list-type) fields, I think that I'm missing the full picture and I'll need to research a little more, I don't know if we have to interpret and validate/do-something with the values or if it's transparent behavior
embeddedResource I'm not sure how this plays with the rest, pretty convinced that it has implications

baloo42 commented 7 months ago

In general: The idea comes from the official Java swagger-annotations:

https://github.com/swagger-api/swagger-core/blob/master/modules/swagger-annotations/src/main/java/io/swagger/v3/oas/annotations/media/Schema.java

It contains any possible option a Schema-Object allows. This approach follows the same principle but allows only those options which are allowed in Kubernetes.

`multipleOf`

This is an option to define that a given integer / floating-point number must be the multiple of the another number.

For example:

properties:
  example:
    type: integer
    multipleOf: 2
    min: 2

Kubernetes would validate this to allow only the following values for the example property: 2, 4, 6, 8... The CRDGenerator would just use it to generate the Schema like we do with others e.g. pattern, format...

how we can integrate in the java-generator too

I think that's something which the java-generator should support only optionally. Jackson, for example, doesn't need to know multipleOf to deserialize data. (The value is not changed, the resulting datatype is also the same).

But a possible feature could be to let the user somehow configure additional annotions for such cases, so that custom annotations are generated into the Java code if a field is annoted with such a property:

For example, if the Java-Generator gets as input the following snippet:

properties:
  example:
    type: integer
    multipleOf: 2

And is configured as follows:

annotationMappings:
  multipleOf: "com.example.ValidMultipleOf(value={{value}})"

The Java-Generator generates:


class Test {
  com.example.ValidMultipleOf(2)
  private Integer example;
}

This could be really useful, because many tools would get support out of the box and the decision of choosing the tool is left out to the user. (And doing the actual validation is left to the tool). I think for example at JSR-380, JSR-349, JSR-303 or a cel-java lib.

`x-kubernetes-list-type`

This property can be used additionally and doesn't change the type or the rest of the schema definiton. It's only an indicator how tools should behave on merging two objects. As such it's not important to implement it specifically in the Java-Generator. But it would also fit into the idea from above.

`embeddedResource`

This property indicates that an object is an embedded resource. An embedded resource is a value that has apiVersion, kind and metadata fields. They are validated implicitly according to the semantics of the currently running apiserver. It is not necessary to add any additional schema for these field, yet it is possible. This can be combined with PreserveUnknownFields.

The CRDGenerator could detect it and could validate the given structural schema. But I think that's not important for the first implementation.

andreaTP commented 7 months ago

@baloo42 I thought a little about this proposal. I think that the fastest way would be to start merging PRs with separate annotations for each field (i.e. name, title, example), this is how we have done so far, and following the path of least resistance would be the most straightforward way of having progress over this issue.

In this way, we can have scoped discussions when time comes to more "semantically meaningful" field integration. For full disclosure, we are about to start developing on a 7.X branch and we can re-evaluate later on what to do if we feel that a breaking change would improve the user experience.

wdyt?

baloo42 commented 7 months ago

I don't think it would be the fastest way. The problem is that we can't work on it in parallel, because each PR requires changes at the same places which would result in quite a few merge conflicts. Especially if we reorganize the tests or the CRDGenerator itself every time a little bit. (That's why I created #5896 first, to have some easy to use structural tests which can be extended in parallel)

At the same time, most of the additions are no-brainers and easy to implement. In my opinion it's more a question on how we design it aka which strategy we want to follow.

Can we discuss the overall strategy in #5859? I don't think we should talk in every PR about the same topics. I would rather create some more issues and describe the overall solution more in detail before we start implementing it.

Because then, it might be possible to implement a whole feature set at once...

andreaTP commented 7 months ago

Can we discuss the overall strategy in https://github.com/fabric8io/kubernetes-client/issues/5859?

Agreed, let's move this discussion there.

stale[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

fabric8io / kubernetes-client