Arnavion / k8s-openapi

Rust definitions of the resource types in the Kubernetes client API
Apache License 2.0
383 stars 41 forks source link

Add optional schema #104

Closed kazk closed 3 years ago

kazk commented 3 years ago

Closes #86

Output of pretty printed Pod::schema(): https://gist.github.com/kazk/e4a56b7ff9431416a1c29a1c602a7c8a

Arnavion commented 3 years ago

The "Files changed" tab lags both Firefox and Chromium to death, so this is going to be a great PR to review :D

Arnavion commented 3 years ago

I know this has been blocked on me for some time, but I've been busy and will probably be busy for another two weeks due to IRL stuff. Sorry.

Arnavion commented 3 years ago

Okay, sorry for the delay. I should have time more consistently now.

I'll continue the discussion from https://github.com/Arnavion/k8s-openapi/pull/104#discussion_r659357386 here so that it's easier to find.


You said:

Methods used in derived JsonSchema to help are private too. Maybe I'm missing something, though. I haven't manually implemented JsonSchema for complex types. We'll need to generate code to build SchemaObject from swagger20::Schema. For each $ref, we'll need to do <T as schemars::JsonSchema>::add_schema_as_property, but this is not part of the public API.

But AFAICT this isn't a problem. JsonSchema::json_schema needs to return a Schema, specifically a Schema::Object which contains a SchemaObject, for which all the fields we would set are public.

And specifically, for $refs you'd create schemars::schema::SchemaObject { reference: Some("io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta".to_owned()), ..Default::default() }


Anyway, going through your change, I remembered why I suggested implementing schemars::JsonSchema instead of returning a serde_json::Value:

https://github.com/kazk/k8s-openapi/blob/c624cce2148ace17a5728b6edc9c4c595dc819fb/src/v1_21/apiextensions_apiserver/pkg/apis/apiextensions/v1/json_schema_props.rs#L725-L741

Each of these is an infinite recursion. Currently this is only a problem for JsonSchemaProps (which is why there's also a circular-reference workaround for this type in another place), but in the future there might be more.

Impling schemars::JsonSchema allows us to use gen.subschema_for for the field types, which solves the circular reference problem.

If we want something generic that isn't tied to schemars but nevertheless leaves refs unflattened to allow recursive types, we can have our custom traits to have an agnostic equivalent of subschema_for:

trait SchemaGenerator {
    // Unlike schemars::gen::SchemaGenerator::subschema_for, this doesn't need to return a value.
    fn visit_subschema<T: Schema>(&mut self);
}

impl Schema for Pod {
    fn schema(gen: &mut SchemaGenerator) -> (&'static str, serde_json::Value, BTreeMap<&'static str, fn() -> serde_json::Value>) {
        let schema_name = "io.k8s.api.core.v1.Pod";

        gen.visit_subschema::<crate::apimachinery::pkg::apis::meta::v1::ObjectMeta>();

        let schema = serde_json! {{
            "properties": {
                "metadata": { "$ref": "io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta" }
            },
        }};
    }
}

But even with this, the impl of k8s_openapi::SchemaGenerator for schemars::gen::SchemaGenerator has to come from k8s_openapi itself (because of orphan rules), so k8s_openapi will end up having a dep on schemars anyway.

kazk commented 3 years ago

Okay, sorry for the delay. I should have time more consistently now.

No problem, thanks for reviewing.

JsonSchema::json_schema needs to return a Schema, specifically a Schema::Object which contains a SchemaObject, for which all the fields we would set are public.

And specifically, for $refs you'd create schemars::schema::SchemaObject { reference: Some("io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta".to_owned()), ..Default::default() }

Can that SchemaObject resolve those $refs? I didn't think so, but haven't looked into it much. My comment was based on looking at the output of #[derive(JsonSchema)] for structs with JsonSchema impled fields ($refs), and finding uses of #[doc(hidden)] methods.


Yeah, the circular reference in JSONSchemaProps* is a problem.

Impling schemars::JsonSchema allows us to use gen.subschema_for for the field types, which solves the circular reference problem.

I originally thought infinite recursion is still a problem even if we use gen.subschema_for because the schema cannot have $refs. And with this restriction, calling T::schema() is basically the same as gen.subschema_for::<T>(). But I noticed inline_subschemas documentation has "Some references may still be generated in schemas for recursive types.", so I guess it does prevent infinite recursion (the schema will be invalid for us, though). For the use case of generating a schema for CRD, circular references should never be a problem because it's impossible to describe with the limited schema.

If we go with schemars route, it'll prevent infinite recursion and produce invalid schema for us instead when that happens. If what you described works, then it might be worth the effort. We should change the feature to schemars.

If it's not as easy to do, maybe replace self referencing fields in JSONSchemaProps* (and anything else in the future) with {} (any type)? Or skip them entirely? Not great, but I don't think anyone is interested in supporting them anyway because the main use case is CRD schema.

Arnavion commented 3 years ago

and finding uses of #[doc(hidden)] methods.

Right, they're #[doc(hidden)] but they're just convenience wrappers for the most part. Essentially the flow is:

https://github.com/GREsau/schemars/blob/master/schemars_derive/src/lib.rs#L121

-> https://github.com/GREsau/schemars/blob/master/schemars_derive/src/schema_exprs.rs#L12

-> https://github.com/GREsau/schemars/blob/master/schemars_derive/src/schema_exprs.rs#L436

-> https://github.com/GREsau/schemars/blob/master/schemars/src/_private.rs#L29

-> https://github.com/GREsau/schemars/blob/master/schemars/src/_private.rs#L36

... and for our codegen we can just do the same expansion ourselves.


I originally thought infinite recursion is still a problem even if we use gen.subschema_for because the schema cannot have $refs.

The way it works with schemars is:

impl JsonSchema for Metadata {
    fn json_schema(gen) -> Schema {
        // (1)
        Schema::Object(SchemaObject {
            object: Some(...),
        })
    }
}

impl JsonSchema for Pod {
    fn json_schema(gen) -> Schema {
        Schema::Object(SchemaObject {
            object: Some(ObjectValidation {
                properties: {
                    "metadata": gen.subschema_for("io...Metadata"), // (2)
                },
            }),
        })
    }
}

(1) is the actual schema of Metadata, and as part of (2) gen.subschema_for() does get this actual Schema and persist it in its internal list of subschemas, but the Schema it returns from that function is a synthesized one that is a Schema::Object(SchemaObject { reference: Some("io...Metadata") }) instead. So the Schema returned by Pod::json_schema does contain $refs, but that's fine, because the targets of the refs have been registered with the generator inside subschema_for.


If we go with schemars route, it'll prevent infinite recursion and produce invalid schema for us instead when that happens.

Can you clarify why it would be invalid? Do you mean that you'd consider a schema with $refs to be invalid? Why is that?

kazk commented 3 years ago

Thanks for expanding on schemars. I'll look into it. They seem to be considering making #[doc(hidden)] in methods in JsonSchema public, so that helps (https://github.com/GREsau/schemars/blob/55b860428e4526ef34886cb877af58256de6b046/schemars/src/lib.rs#L364-L374).

Do you mean that you'd consider a schema with $refs to be invalid? Why is that?

It's a valid JSON Schema/Open API, but it's invalid for our use case because $ref is not allowed in CRD schema. https://github.com/kubernetes/kubernetes/issues/62872 We use inline_subschemas when generating.

Arnavion commented 3 years ago

inline_subschemas will still work for your use case, then. It's the SchemaGenerator's job to handle that.

kazk commented 3 years ago

inline_subschemas will still work for your use case, then.

No, I knew that. I meant the recursive types won't be supported (for our use case) either way. The current version has infinite recursion, and schemars version will have $ref.

kazk commented 3 years ago

For Pod, the generated code should look like the following:

impl crate::schemars::JsonSchema for Pod {
    fn schema_name() -> String {
        "io.k8s.api.core.v1.Pod".to_owned()
    }

    fn json_schema(__gen: &mut crate::schemars::gen::SchemaGenerator) -> crate::schemars::schema::Schema {
        crate::schemars::schema::Schema::Object(crate::schemars::schema::SchemaObject {
            metadata: Some(Box::new(crate::schemars::schema::Metadata {
                description: Some("Pod is a collection of containers that can run on a host. This resource is created by clients and scheduled onto hosts.".to_owned()),
                ..Default::default()
            })),
            object: Some(Box::new(crate::schemars::schema::ObjectValidation {
                properties: std::array::IntoIter::new([
                    (
                        "apiVersion".to_owned(),
                        crate::schemars::schema::Schema::Object(crate::schemars::schema::SchemaObject {
                            metadata: Some(Box::new(crate::schemars::schema::Metadata {
                                description: Some("APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources".to_owned()),
                                ..Default::default()
                            })),
                            instance_type: Some(std::convert::Into::into(crate::schemars::schema::InstanceType::String)),
                            ..Default::default()
                        }),
                    ),
                    (
                        "kind".to_owned(),
                        crate::schemars::schema::Schema::Object(crate::schemars::schema::SchemaObject {
                            metadata: Some(Box::new(crate::schemars::schema::Metadata {
                                description: Some("Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds".to_owned()),
                                ..Default::default()
                            })),
                            instance_type: Some(std::convert::Into::into(crate::schemars::schema::InstanceType::String)),
                            ..Default::default()
                        }),
                    ),
                    (
                        "metadata".to_owned(),
                        __gen.subschema_for::<crate::apimachinery::pkg::apis::meta::v1::ObjectMeta>(),
                    ),
                    (
                        "spec".to_owned(),
                        __gen.subschema_for::<crate::api::core::v1::PodSpec>(),
                    ),
                    (
                        "status".to_owned(),
                        __gen.subschema_for::<crate::api::core::v1::PodStatus>(),
                    ),
                ]).collect(),
                required: std::array::IntoIter::new([
                    "metadata",
                ]).map(std::borrow::ToOwned::to_owned).collect(),
                ..Default::default()
            })),
            ..Default::default()
        })
    }
}
JSON ```json { "type": "object", "description": "Pod is a collection of containers that can run on a host. This resource is created by clients and scheduled onto hosts.", "properties": { "apiVersion": { "description": "APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources", "type": "string" }, "kind": { "description": "Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds", "type": "string" }, "metadata": { "$ref": "#/definitions/io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta", "description": "Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata" }, "spec": { "$ref": "#/definitions/io.k8s.api.core.v1.PodSpec", "description": "Specification of the desired behavior of the pod. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status" }, "status": { "$ref": "#/definitions/io.k8s.api.core.v1.PodStatus", "description": "Most recently observed status of the pod. This data may not be up to date. Populated by the system. Read-only. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status" } } } ```

gen.subschema_for::<T>() should work, but we'll lose the overrides (description and extensions) set in the reference object in the upstream spec. To be fair, that's the correct behavior.

When an object contains a $ref property, the object is considered a reference, not a schema. Therefore, any other properties you put there will not be treated as JSON Schema keywords and will be ignored by the validator.

https://json-schema.org/understanding-json-schema/structuring.html#ref

This object cannot be extended with additional properties and any properties added SHALL be ignored. https://swagger.io/specification/#reference-object

Arnavion commented 3 years ago

I started a thing here if you want to build off of it. k8s-openapi-codegen-common/src/templates/impl_schema.rs needs to be filled out to handle all the SchemaKinds.

kazk commented 3 years ago

I got it mostly working locally. Output of JSONSchemaProps with inline_subschemas: https://gist.github.com/kazk/0de1996d49018d53557dab4b22b0c5f7 (some types are still missing)

kazk commented 3 years ago

I was able to override description for $refs (e.g., fixes port field having the description of IntOrString). PodSpec output: https://gist.github.com/kazk/33bb9f9b22da93349372748ae84a1ede

Extensions can be added similarly, so there's no downsides now. I'll open a new PR later.

kazk commented 3 years ago

Superseded by #105.