feat(crd-generator): Add CRD-Generator Maven plugin

fabric8io / kubernetes-client

Java client for Kubernetes & OpenShift

http://fabric8.io

Apache License 2.0

3.33k stars 1.44k forks source link

feat(crd-generator): Add CRD-Generator Maven plugin #5979

Open baloo42 opened 4 weeks ago

baloo42 commented 4 weeks ago

Description

Add CRD-Generator Maven plugin (using api-v2).

Fixes #5944

Examples: https://github.com/baloo42/crd-generator-maven-examples

mvn crd-generator:help -Ddetail=true -Dgoal=generate

crd-generator:generate

  Available parameters:

    classpath (Default: WITH_RUNTIME_DEPENDENCIES)
      The classpath which should be used during the CRD generation.
      Choice of:
      * PROJECT_ONLY: Only classes in the project.
      * WITH_RUNTIME_DEPENDENCIES: Classes from the project and any runtime
      dependencies.
      * WITH_COMPILE_DEPENDENCIES: Classes from the project and any compile time
      dependencies.
      * WITH_ALL_DEPENDENCIES: Classes from the project, compile time and
      runtime dependencies.
      * WITH_ALL_DEPENDENCIES_AND_TESTS: Classes from the project (including
      tests), compile time, runtime and test dependencies.
      User property: fabric8.crd-generator.classpath

    customResourceClasses
      Custom Resource classes, which should be considered to generate the CRDs.
      If set, scanning is disabled.
      User property: fabric8.crd-generator.customResourceClasses

    dependenciesToScan
      Dependencies which should be scanned for Custom Resources.
      User property: fabric8.crd-generator.dependenciesToScan

    exclusions
      Exclusions, used to filter Custom Resource classes after scanning.
      User property: fabric8.crd-generator.exclusions

    forceIndex (Default: false)
      If true, a Jandex index will be created even if the directory or JAR file
      contains an existing index.
      User property: fabric8.crd-generator.forceIndex

    forceScan (Default: false)
      If true, directories and JAR files are scanned even if Custom Resource
      classes are given.
      User property: fabric8.crd-generator.forceScan

    implicitPreserveUnknownFields (Default: false)
      If true, x-kubernetes-preserve-unknown-fields: true will be added on
      objects which contain an any-setter or any-getter.
      User property: fabric8.crd-generator.implicitPreserveUnknownFields

    inclusions
      Inclusions, used to filter Custom Resource classes after scanning.
      User property: fabric8.crd-generator.inclusions

    outputDirectory (Default:
    ${project.build.outputDirectory}/META-INF/fabric8/)
      The output directory where the CRDs are emitted.
      User property: fabric8.crd-generator.outputDirectory

    parallel (Default: true)
      If true, the CRDs will be generated in parallel.
      User property: fabric8.crd-generator.parallel

    skip (Default: false)
      If true, execution will be skipped.
      User property: fabric8.crd-generator.skip

Type of change

[ ] Bug fix (non-breaking change which fixes an issue)
[x] Feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change
[ ] Chore (non-breaking change which doesn't affect codebase; test, version modification, documentation, etc.)

Checklist

[x] Code contributed by me aligns with current project license: Apache 2.0
[ ] I Added CHANGELOG entry regarding this change
[x] I have implemented unit tests to cover my changes
[x] I have added/updated the javadocs and other documentation accordingly
[ ] No new bugs, code smells, etc. in SonarCloud report
[ ] I tested my code in Kubernetes
[ ] I tested my code in OpenShift

baloo42 commented 3 weeks ago

@manusa Can you have second look on this? Is this now going into the right direction?

The logic to collect and load custom resource classes is now included in an own module "crd-generator-collector" intended to be shared between maven-plugin, gradle-plugin, cli etc.

TBD:

Do we have to care about Zip Bombs, if we collect custom resource classes from a jar file? (I have added some safeguards but sonar is not recognizing it)
Do we need reset methods? (the collector classes contain a state) --> added
Do we want to include the examples in some way? --> https://github.com/baloo42/crd-generator-maven-examples --> added as integration tests

A prototype for the CLI which is using the same base can be found here: https://github.com/baloo42/kubernetes-client/pull/9

baloo42 commented 3 weeks ago

@shawkins, @andreaTP Maybe you can have a look on this too.

Marc has mentioned that he is not sure about the final / future purpose of the new module (me too), so I'll try to explain a few thoughts I had.

To use the CRD-Generator from CLI, Maven, Gradle etc. we need in general two main features which are not implemented in the api-v2 module:

Search / Collect / Filter custom resources classes
Customizable class loading

The new crd-generator-collector module implements both features via the main component CustomResourceCollector which can be used in the following way:

CustomResourceCollector customResourceCollector = new CustomResourceCollector()
        .withClasspathElements(<additional classpath elements>)
        .withFilesToIndex(<e.g. directory with class files or jar file with class files>)

CustomResourceInfo[] customResourceInfos = customResourceCollector.findCustomResources();

CRDGenerator crdGenerator = new CRDGenerator()
        .customResources(customResourceInfos)
        .inOutputDir(outputDirectory);

crdGenerator.detailedGenerate()

In conclusion this approach is some kind of "pre-processor" for the CRDGenerator.

Another approach would be to implement it as a facade ("processor"):

CRDGeneratorFacade facade = new CRDGeneratorFacade()
        .withClasspathElements(<additional classpath elements>)
        .withFilesToIndex(<e.g. directory with class files or jar file with class files>)
        .inOutputDir(outputDirectory)

facade.detailedGenerate()

This approach is nicer from a user perspective but we need to passthrough every option (or use some kind of nested builder pattern).

A third approach would be to implement both features in the api-v2 module itself and make jandex an optional dependency (or allow to extend it by using a service loader).

CRDGenerator crdGenerator = new CRDGenerator()
        .withClasspathElements(<additional classpath elements>)
        .withFilesToIndex(<e.g. directory with class files or jar file with class files>)
        .inOutputDir(outputDirectory)

crdGenerator.detailedGenerate()

This approach might be the nicest but it would also be the most advanced: the api-v2 module has now to deal somehow with additional dependencies like Jandex and the code base of the api-v2 would grow.

Which approach do you prefer? Other thoughts?

shawkins commented 3 weeks ago

Which approach do you prefer? Other thoughts?

I'd like to keep the core api as simple as possible - whatever is needed to satisfy Quarkus should be good.

The first option seems good enough. I don't really see much need for coupling between the collector and the crd generator - it doesn't really need to provide CustomResourceInfo, just classes. I mentioned on a pr somewhere that the collector to me just looks like something that you could provide a list of interfaces / base classes, and a list of annotations, and it will just find everything that matches that.

baloo42 commented 3 weeks ago

I'd like to keep the core api as simple as possible - whatever is needed to satisfy Quarkus should be good. :+1:

The first option seems good enough. I don't really see much need for coupling between the collector and the crd generator - it doesn't really need to provide CustomResourceInfo, just classes.

Yes, you are right. Providing only classes could basically do the job but then we cannot implement high level filtering (e.g. by group or version). Do we need high level filtering? If yes: High level filtering could also be implemented in api-v2 so that we can stay by the strategy to return only classes instead of CustomResourceInfo's.

I mentioned on a pr somewhere that the collector to me just looks like something that you could provide a list of interfaces / base classes, and a list of annotations, and it will just find everything that matches that.

Yes, at the moment the base interface (HasMetadata) and the additional required annotations (Version, Group) to identify a custom resource class are hardcoded. But this can be easily changed to be configurable (with sane defaults) if you think this is useful. Do we need to search for other Interfaces/Annotations?

andreaTP commented 3 weeks ago

Completely agree with what @shawkins said, let's keep the core API as simple as possible. I don't think that filtering is required, and can be implemented in the tooling when necessary.

shawkins commented 3 weeks ago

Do we need high level filtering? If yes: High level filtering could also be implemented in api-v2 so that we can stay by the strategy to return only classes instead of CustomResourceInfo's.

Right I'd vote for putting that into the api if needed.

It would probably be good to consider whether having a public constructor on CustomResourceInfo makes sense - that opens up a different usage model for the CRDGenerator that isn't bound to the typical annotations.

Do we need to search for other Interfaces/Annotations?

Not yet. I was just highlighting that it doesn't really need to be seen as that specific to CRs.

baloo42 commented 3 weeks ago

Thanks for the feedback. To sum it up, I have the following todo's so far:

[x] Return a list of custom resource classes instead of a list of CustomResourceInfo's
[x] Remove high level filtering from crd-generator-collector module with no replacement atm
[ ] ~~Try to generalize the used interfaces / annotations to identify a custom resource~~

shawkins commented 3 weeks ago

Try to generalize the used interfaces / annotations to identify a custom resource

Don't worry about this step just yet, what you currently have should work fine for the first iteration.

metacosm commented 1 day ago

Which approach do you prefer? Other thoughts?

I'd like to keep the core api as simple as possible - whatever is needed to satisfy Quarkus should be good.

The first option seems good enough. I don't really see much need for coupling between the collector and the crd generator - it doesn't really need to provide CustomResourceInfo, just classes. I mentioned on a pr somewhere that the collector to me just looks like something that you could provide a list of interfaces / base classes, and a list of annotations, and it will just find everything that matches that.

I need to look deeper at this PR but one thing that is raising my interest is the use of Jandex. The Quarkus extension for the Java Operator SDK is using classes to get to the needed CustomResourceInfo… If we'd have a generator that worked directly off of Jandex' ClassInfo, that'd be a win. That said, the Jackson-based approach requires classes so… 🤷🏼 What we really need is a Jackson processor that works from Jandex indices! 😀

sonarcloud[bot] commented 1 day ago

Quality Gate failed

Failed conditions
1 Security Hotspot
75.4% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

baloo42 commented 1 day ago

I need to look deeper at this PR but one thing that is raising my interest is the use of Jandex. The Quarkus extension for the Java Operator SDK is using classes to get to the needed CustomResourceInfo… If we'd have a generator that worked directly off of Jandex' ClassInfo, that'd be a win. That said, the Jackson-based approach requires classes so… 🤷🏼 What we really need is a Jackson processor that works from Jandex indices! 😀

I agree, using directly such ClassInfo objects would be a nice solution. As you said, the underlying JsonSchemaGenerator from Jackson requires classes atm. We would need Jackson to directly work with Jandex or create synthetically classes from ClassInfo objects. But I'm really unsure if this is possible or worth it.

For now Jandex is only used to "scan" for Custom Resource classes. If a project already uses Jandex to create an index (or a dependency, which should be scanned, contains a Jandex index) then those existing indices will be used instead of generating an own index, so that finding Custom Resource classes should be little bit faster.

metacosm commented 23 hours ago

I need to look deeper at this PR but one thing that is raising my interest is the use of Jandex. The Quarkus extension for the Java Operator SDK is using classes to get to the needed CustomResourceInfo… If we'd have a generator that worked directly off of Jandex' ClassInfo, that'd be a win. That said, the Jackson-based approach requires classes so… 🤷🏼 What we really need is a Jackson processor that works from Jandex indices! 😀

I agree, using directly such ClassInfo objects would be a nice solution. As you said, the underlying JsonSchemaGenerator from Jackson requires classes atm. We would need Jackson to directly work with Jandex or create synthetically classes from ClassInfo objects. But I'm really unsure if this is possible or worth it.

Yes, it's probably not worth it at this point as that would be a major undertaking, not to mention that we want the CRD generator to follow Jackson's conventions…

For now Jandex is only used to "scan" for Custom Resource classes. If a project already uses Jandex to create an index (or a dependency, which should be scanned, contains a Jandex index) then those existing indices will be used instead of generating an own index, so that finding Custom Resource classes should be little bit faster.

In a Quarkus extension, like is the case for the Quarkus extension for the Java Operator SDK, we already have an index, so being able to pass the index to the generator would be good, indeed.

baloo42 commented 22 hours ago

For now Jandex is only used to "scan" for Custom Resource classes. If a project already uses Jandex to create an index (or a dependency, which should be scanned, contains a Jandex index) then those existing indices will be used instead of generating an own index, so that finding Custom Resource classes should be little bit faster.

In a Quarkus extension, like is the case for the Quarkus extension for the Java Operator SDK, we already have an index, so being able to pass the index to the generator would be good, indeed.

As far as I understand atm the code from QOSDK and Quarkus, I assume the following (please correct me if it's wrong):

A jandex index will be always generated for the current project (Because it's a core Quarkus feature for CDI)
In QOSDK, the index is used to find reconcilers and associated CRDS, as well as non-reconciler associated CRDs.
The results of the CRD generation are important to further generate the CSV manifests

The current code of the CustomResourceCollector already supports contributing existing indices:

https://github.com/baloo42/kubernetes-client/blob/crd-generator-maven-plugin/crd-generator/collector/src/main/java/io/fabric8/crd/generator/collector/CustomResourceCollector.java#L98

Therefore it should be possible to reuse the collector module from within QOSDK. At least for the non-reconciler associated CRDs:

https://github.com/quarkiverse/quarkus-operator-sdk/blob/next-fabric8-version/core/deployment/src/main/java/io/quarkiverse/operatorsdk/deployment/CRDGenerationBuildStep.java#L78

This would result in the exact same behaviour to find a custom resource class if we compare qosdk, crd-generator maven plugin, gradle plugin etc. At the same time it wouldn't need too much changes on QOSDK side.

On CustomResourceCollector side, we only have to ensure that we can disable creating own indices entirely e.g. by adding a flag.

What do you think?

metacosm commented 19 hours ago

For now Jandex is only used to "scan" for Custom Resource classes. If a project already uses Jandex to create an index (or a dependency, which should be scanned, contains a Jandex index) then those existing indices will be used instead of generating an own index, so that finding Custom Resource classes should be little bit faster.

In a Quarkus extension, like is the case for the Quarkus extension for the Java Operator SDK, we already have an index, so being able to pass the index to the generator would be good, indeed.

As far as I understand atm the code from QOSDK and Quarkus, I assume the following (please correct me if it's wrong):
* A jandex index will be always generated for the current project (Because it's a core Quarkus feature for CDI)

* In QOSDK, the index is used to find reconcilers and associated CRDS, as well as non-reconciler associated CRDs.

* The results of the CRD generation are important to further generate the CSV manifests
The current code of the CustomResourceCollector already supports contributing existing indices:

https://github.com/baloo42/kubernetes-client/blob/crd-generator-maven-plugin/crd-generator/collector/src/main/java/io/fabric8/crd/generator/collector/CustomResourceCollector.java#L98

Therefore it should be possible to reuse the collector module from within QOSDK. At least for the non-reconciler associated CRDs:

https://github.com/quarkiverse/quarkus-operator-sdk/blob/next-fabric8-version/core/deployment/src/main/java/io/quarkiverse/operatorsdk/deployment/CRDGenerationBuildStep.java#L78

This would result in the exact same behaviour to find a custom resource class if we compare qosdk, crd-generator maven plugin, gradle plugin etc. At the same time it wouldn't need too much changes on QOSDK side.

On CustomResourceCollector side, we only have to ensure that we can disable creating own indices entirely e.g. by adding a flag.

What do you think?

Sounds reasonable.