elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.79k stars 8.19k forks source link

[Security Solution] Collect the first feedback on documentation bundling approaches #183019

Closed maximpn closed 4 months ago

maximpn commented 5 months ago

Epic: https://github.com/elastic/security-team/issues/9407

Summary

Introduction of a versioned router allows to version Kibana's API endpoints with different granularity. It has a direct impact on bundling OpenAPI specifications (OAS) for API documentation purposes since documentation reflects the version. Each bundle should group API endpoints of the same version.

The goal of this ticket is to collect the first feedback from Docs Engineering and Kibana Core teams on OAS bundling approaches after opening PRs for

Details

Kibana is a large monorepo consisting of different plugins. From business point of view it's separated into Enterprise Search, Observability and Security solutions. Currently Kibana's API endpoints are explicitly versioned (as it's required for zero downtime Serverless upgrades) so public endpoints have version 2023-10-31 (there is only one version at the moment) and internal endpoints have version 1 (there is only one version at the moment too). We consider only public Kibana API endpoints for the purpose of public API documentation.

API consumers have to specify API version via an HTTP header elastic-api-version as it's required by versioned router. OAS doesn't provide a way to declare different request and response schemas per version in one OAS document when an HTTP header like elastic-api-version is used to discriminate API versions . Technically it requires creation of OAS per API endpoint version.

Docs Engineering team prefers to have a simple and straightforward approach to consume Kibana OAS. One of the ideas is to have a single OAS per Kibana so it can be uploaded to the documentation platform. Obviously this approach isn't feasible taking into account limitations of OAS regarding API versioning via elastic-api-version HTTP header.

Additionally there is some discrepancy for schema's info.version between OpenAPI specification and practical usage. Official documentation states that info.version is an OAS document version

The version of the OpenAPI document (which is distinct from the OpenAPI Specification version or the API implementation version).

while the majority of practical examples (1, 2, 3) use info.version as an API implementation version.

Security Solution relies on OAS for code generation (Zod schemas, TS types, API clients) where schema's info.version is used as an API implementation version which is currently 2023-10-31. Taking into account this approach it's definitely problematic to bundle the all Kibana OAS because

Taking above into account OAS bundles should be segregated. The main question is "What should be granularity of this segregation?". Wether bundles should be segregated by solution (e.g. Observability or Security) or domain (e.g. Security Endpoint or Security Detection Engine).

The goal of this ticket is to get is to collect the first feedback from Docs Engineering and Kibana Core teams on OAS bundling approaches where

From DX point of view it's convenient and logical to have a bundle per API domain. Each domain/bundle would version independently.

Action items

elasticmachine commented 5 months ago

Pinging @elastic/security-solution (Team: SecuritySolution)

jloleysens commented 5 months ago

Taking above into account OAS bundles should be segregated. The main question is "What should be granularity of this segregation?". Wether bundles should be segregated by solution (e.g. Observability or Security) or domain (e.g. Security Endpoint or Security Detection Engine).

This is a good question. Technically I don't see a problem with either approach for security solution, but my gut feel is that starting with a single bundle per version would be simpler.

Taking a step back: would it make sense to bundle Kibana Core generated OAS and your existing OAS into a single file for Kibana?

We still need to persist generated OAS to a file somewhere, and once it is we should probably bundle all our JSON together into a single, global file. Maybe a new location like <kibana-root>/oas_docs/<version>_bundle.json. This will likely need to all form part of the same sequence of steps where:

(1) We generate OAS from Kibana /api/oas to some file (per this issue) (2) We take any post-steps (like bundling in security solution), we don't currently support generating OAS from Zod

WDYT @lcawl @maximpn ?

lukeelmers commented 5 months ago

From DX point of view it's convenient and logical to have a bundle per API domain. Each domain/bundle would version independently.

Separate bundles would enable independent versioning, but not require it. At this point I don't think it is likely we will have a future with small subsets of APIs being versioned separately as their own domain. The plan as of now is still to just have solution-level domain groupings (platform + 3 solutions). Whether or not they live as separate bundles is an implementation detail, as long as they aren't being versioned in a way that's out of sync with each other.

would it make sense to bundle Kibana Core generated OAS and your existing OAS into a single file for Kibana?

I think a "post-build" step like this worth exploring, especially if having a single OAS file makes things easier for the docs team.

maximpn commented 5 months ago

Hi @jloleysens and @lukeelmers,

Thanks for the review!

I see your point that you lean towards solution bundles and justification has valid points. At the same time domain bundles might be much more flexible. Let me share my thoughts

image

With bigger scope bundles there is a question who is gonna be pinged when all Kibana or solution bundle is changed? Who is gonna review it? An answer may be we don't need to review it since it's an automatically produced artifact but in some rare case we'll have troubles with that.

On top of that I'd like to highlight how ESS documentation looks like. Security Solution APIs are listed by domains.

image

It looks like we need an extra folding level. So documentation doesn't have to present the whole Kibana APIs but rather split them to core, platform and solutions.


And there are answers on questions above.

Taking a step back: would it make sense to bundle Kibana Core generated OAS and your existing OAS into a single file for Kibana?

To answer this question it's good to understand why it might be beneficial to bundle Kibana Core and Security Solution generated OAS bundles. I see the following

(1) We generate OAS from Kibana /api/oas to some file (per https://github.com/elastic/kibana/issues/181992 issue)

What will /api/oas output? Is it the whole Kibana's OAS?

@jloleysens @kbn/openapi-bundler supports functionality you need in https://github.com/elastic/kibana/issues/181992 via custom x- properties.

lcawl commented 5 months ago

OAS doesn't provide a way to declare different request and response schemas per version in one OAS document when an HTTP header like elastic-api-version is used to discriminate API versions . Technically it requires creation of OAS per API endpoint version.

I don't think this is accurate, since when I generate the automated output related to https://github.com/elastic/kibana/issues/180056, I happily see the versioning represented in the header parameter and the request and response bodies. For example:

...
"parameters": [
          {
            "in": "header",
            "name": "elastic-api-version",
            "description": "The version of the API to use",
            "schema": {
              "type": "string",
              "enum": [
                "2023-10-31"
              ],
              "default": "2023-10-31"
            }
          },
...
        "requestBody": {
          "content": {
            "application/json; Elastic-Api-Version=2023-10-31": {
              "schema": {
                "anyOf": [
                  {
                    "type": "object",
...

From a docs point of view, this level of version-specific detail is great! Per side discussions with @jloleysens, if we want to be able to say which versions support specific API parameters, we might need to request support for a new custom extension like "x-availability-since" or something like that, but otherwise I'm happy with the automated Kibana API docs progress so far.

On a related note, I don't see the header parameter values or version qualifiers in the request/response bodies in either https://github.com/elastic/kibana/pull/183021 or https://github.com/elastic/kibana/pull/183026 yet, but ideally they'd contain the same level of detail.

On top of that I'd like to highlight how ESS documentation looks like. Security Solution APIs are listed by domains.

The tool that we'll use to publish our API documentation can group the endpoints either by tag or else automatically by deriving a grouping from the path. So far we've been advocating that teams explicitly add tags rather than relying on auto-generated groups. In particular, we advise that teams "define a set of tags that group related endpoints in a meaningful way, such as by feature or logical objects. Use human-readable tag names and add tag descriptions". Right now in https://github.com/elastic/kibana/pull/183021 or https://github.com/elastic/kibana/pull/183026 the tags don't align with the types of groupings in the old docs (e.g. tags are "Rules API", "Import/Export API", etc whereas it would be more consistent with what we're doing in other areas to use tag names like "detections", "exceptions" or maybe even "security detections", etc) @natasha-moore-elastic would likely know this content area better than I do to offer specific tag name suggestions.

Taking a step back: would it make sense to bundle Kibana Core generated OAS and your existing OAS into a single file for Kibana?

That's what I'm currently doing to get a single Kibana Serverless OpenAPI document for documentation purposes and it would be great to have this accomplished automatically. I use the redocly join tool, which can qualify any components with duplicated names, but there are likely lots of alternative bundle/join tools. From a docs point of view, we'd just ask that the end document (1) be a valid OAS file, and (2) pass our linting rules for the minimum things required for good docs. If you're taking multiple generated and non-generated files and joining them together automatically, it likely makes sense to have the linting performed on each team's individual file and any failures fixed at that stage rather than having any open-endedness about who "owns" the final file.

If we follow the same as ESS API documentation approach documentation platform should be able to display subcategories under Kibana like Core, Observability or Security...

Our plan at this point is for the docs to categorize APIs via tags that reflect the feature or domain, rather than the solution.

Serverless is deployed per project so it's solution API's + core API's. In this case merging bundles will be necessary to display documentation on one page if documentation platform won't support uploading bundles individually.

Ditto previous answer. At this time we do not intend to subdivide the Kibana Serverless API docs by solution. If there are errors that occur if you try to run an Kibana Security (serverless) API against a Search or Observability project, my expectation is that the OpenAPI document would contain that error code and a clear explanation.

If I've missed addressing any questions in this thread, happy to meet and discuss further.

maximpn commented 4 months ago

@lcawl

I tried different variants in Bump.sh to discover it's capabilities. Bump.sh allows to create hubs and additionally groups API documentations inside hubs via Hub category

image

Domain level documentation

I used hubs and hub category to create Kibana (Serverless) example documentation based on [Security Solution][POC] Bundle Security Solution OAS per domain.

image

https://github.com/elastic/kibana/assets/3775283/d2a79dfb-e91c-4413-8b28-a6d91113d881

Versions for Detections Engine API documentation are handled via branches

version 2023-10-31 image

version 2024-05-15 image

Solution level documentation

I also created a solution level documentation example by created Kibana (Serverless) hub based on [Security Solution][POC] Bundle Security Solution OAS(without using hub categories).

https://github.com/elastic/kibana/assets/3775283/94c061e6-717b-4b5c-a8c7-13c850fea7f1

version 2023-10-31 image

version 2024-05-15 image

Please don't pay attention to Observability API documentation since it's just for grouping demonstration purposes.

Why are there just two API endpoints having 2024-05-15 version?

You may notice that there are just two endpoints with a new 2024-05-15 version. This reflects real Kibana server behavior. There are route handler registrations for /api/detection_engine/rules/_import and /api/detection_engine/rules/_find with the new 2024-05-15 version. It becomes clear that the bigger domain the harder to bump its version. It's gonna be immense amount of work to go over all API endpoints even in Security Solution.

This is important to understand that solution level API versioning can easily lead to bad DX where hundreds files need to be changed even if there are not so much changes. Check out how many files added just for bumping two API endpoints version.

cc @jloleysens @lukeelmers

maximpn commented 4 months ago

@lcawl

I don't think this is accurate, since when I generate the automated output related to https://github.com/elastic/kibana/issues/180056, I happily see the versioning represented in the header parameter and the request and response bodies.

I'm afraid there is misunderstanding here. Let me elaborate my arguments.

OAS Response Media type

Checking OAS specification you may find Response Object documentation. It has content field which has Map[string, [Media Type Object](https://swagger.io/specification/v3/#media-type-object)] type that is a map from Media type to Media Type Object. There are some well known and popular media types like application/json, text/html and text/xml.

I doubled checked the documentation and it states that Media types can have a parameter like type/subtype;parameter=value.

Considering OAS produced by /api/oas we get media types like application/json; Elastic-Api-Version=2023-10-31. It means there is a media type application/json with Elastic-Api-Version parameter.

Taking into account Kibana's API versioning works via HTTP header Elastic-Api-Version it has nothing in common with Media types. It just adds a discrepancy between real behavior and OAS describing this behavior. For example importing such an OAS to Postman gives me what you can see below

image

You can notice that version is duplicated twice. In fact Media type parameter Elastic-Api-Version gets ignored by Kibana server and only the HTTP header Elastic-Api-Version matters.

When Media type parameter approach does not work?

I added a breaking change example with a necessary version bump to the PRs above ([Security Solution][POC] Bundle Security Solution OAS and [Security Solution][POC] Bundle Security Solution OAS per domain). It demonstrates really close to real life API changes where

It may be possible to find some tricks (maybe with x-availability-since) to describe changes to /api/detection_engine/rules/_find. But I don't see how can Media type parameters guarantee that a specific request body corresponds to a specific response body in /api/detection_engine/rules/_import.

Let's look at an example:

version 2023-10-31

/api/detection_engine/rules/_import:
    post:
      ...
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              type: object
              properties:
                file:
                  type: string
                  format: binary
                  description: The `.ndjson` file containing the rules.
      ...
      responses:
        200:
          description: Indicates a successful call.
          content:
            application/json:
              schema:
                type: object
                additionalProperties: false
                required:
                  - exceptions_success
                  - exceptions_success_count
                  - exceptions_errors
                  - rules_count
                  - success
                  - success_count
                  - errors
                  - action_connectors_errors
                  - action_connectors_warnings
                  - action_connectors_success
                  - action_connectors_success_count
                properties:
                  exceptions_success:
                    type: boolean
                  exceptions_success_count:
                    type: integer
                    minimum: 0
                  exceptions_errors:
                    type: array
                    items:
                      $ref: '../../model/error_schema.schema.yaml#/components/schemas/ErrorSchema'
                  rules_count:
                    type: integer
                    minimum: 0
                  success:
                    type: boolean
                  success_count:
                    type: integer
                    minimum: 0
                  errors:
                    type: array
                    items:
                      $ref: '../../model/error_schema.schema.yaml#/components/schemas/ErrorSchema'
                  action_connectors_errors:
                    type: array
                    items:
                      $ref: '../../model/error_schema.schema.yaml#/components/schemas/ErrorSchema'
                  action_connectors_warnings:
                    type: array
                    items:
                      $ref: '../../model/warning_schema.schema.yaml#/components/schemas/WarningSchema'
                  action_connectors_success:
                    type: boolean
                  action_connectors_success_count:
                    type: integer
                    minimum: 0

and a new added version 2024-05-15

/api/detection_engine/rules/_import:
    post:
      ...
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: array
              items:
                $ref: '../../model/rule_schema/rule_schemas.schema.yaml#/components/schemas/RuleCreateProps'
      ...
      responses:
        200:
          description: Indicates a successful call.
          content:
            application/json:
              schema:
                type: object
                additionalProperties: false
                required:
                  - imported_rules_count
                  - success
                  - errors
                properties:
                  imported_rules_count:
                    type: integer
                    minimum: 0
                  success:
                    type: boolean
                  errors:
                    type: array
                    items:
                      $ref: '../../model/error_schema.schema.yaml#/components/schemas/ErrorSchema'

You can notice that request body and response have changed. Request body became application/json Media type in 2024-05-15 version and response has changed as well.

When it's described with Media type parameter Elastic-Api-Version it becomes

/api/detection_engine/rules/_import:
    post:
      ...
      requestBody:
        required: true
        content:
          'multipart/form-data; Elastic-Api-Version=2023-10-31':
            schema:
              type: object
              properties:
                file:
                  type: string
                  format: binary
                  description: The `.ndjson` file containing the rules.
          'application/json; Elastic-Api-Version=2024-05-15':
              schema:
                type: array
                items:
                  $ref: '../../model/rule_schema/rule_schemas.schema.yaml#/components/schemas/RuleCreateProps'
      ...
      responses:
        200:
          description: Indicates a successful call.
          content:
            'application/json; Elastic-Api-Version=2023-10-31':
              schema:
                type: object
                additionalProperties: false
                required:
                  - exceptions_success
                  - exceptions_success_count
                  - exceptions_errors
                  - rules_count
                  - success
                  - success_count
                  - errors
                  - action_connectors_errors
                  - action_connectors_warnings
                  - action_connectors_success
                  - action_connectors_success_count
                properties:
                  exceptions_success:
                    type: boolean
                  exceptions_success_count:
                    type: integer
                    minimum: 0
                  exceptions_errors:
                    type: array
                    items:
                      $ref: '../../model/error_schema.schema.yaml#/components/schemas/ErrorSchema'
                  rules_count:
                    type: integer
                    minimum: 0
                  success:
                    type: boolean
                  success_count:
                    type: integer
                    minimum: 0
                  errors:
                    type: array
                    items:
                      $ref: '../../model/error_schema.schema.yaml#/components/schemas/ErrorSchema'
                  action_connectors_errors:
                    type: array
                    items:
                      $ref: '../../model/error_schema.schema.yaml#/components/schemas/ErrorSchema'
                  action_connectors_warnings:
                    type: array
                    items:
                      $ref: '../../model/warning_schema.schema.yaml#/components/schemas/WarningSchema'
                  action_connectors_success:
                    type: boolean
                  action_connectors_success_count:
                    type: integer
                    minimum: 0
            'application/json; Elastic-Api-Version=2024-05-15':
              schema:
                type: object
                additionalProperties: false
                required:
                  - imported_rules_count
                  - success
                  - errors
                properties:
                  imported_rules_count:
                    type: integer
                    minimum: 0
                  success:
                    type: boolean
                  errors:
                    type: array
                    items:
                      $ref: '../../model/error_schema.schema.yaml#/components/schemas/ErrorSchema'

The latter example makes it impossible (according to OAS) to match specific request body (version 2023-10-31 or 2024-05-15) to a specific response body (version 2023-10-31 or 2024-05-15) since it represents many to many relationship.

Taking all the above into account using Media type parameter Elastic-Api-Version looks like a hack to overcome bump.sh's limitations in documenting different API versions. It adds discrepancy between OAS and real Kibana's behavior which makes it harder to reuse the same OAS to consume Kibana's API (for example via Postman). On top of that It reduces documentation UX forcing users to track which request body version corresponds to which response body. Ideal UX would be to have tabs shown for each API endpoint having multiple versions where each tab's content shows only specific API version request/response.

FYI @jloleysens

lcawl commented 4 months ago

Taking all the above into account using Media type parameter Elastic-Api-Version looks like a hack to overcome bump.sh's limitations in documenting different API versions...

To be clear, none of the guidelines we're proposing for publishing our API docs are at all limited by who we're partnering with. We are very excited about this opportunity to fully document our APIs to a degree that we've never managed to date. We'll get to address some long-standing user pain points and be strategic about how we handle the ever-growing API docs landscape and increasingly divergent versioning strategy.

The difficulty of relating specific requests and responses to query/header parameter values seems to be a long-standing issue (e.g. https://stackoverflow.com/questions/74305458/openapi-3-x-x-having-different-response-with-the-change-of-optional-parameters). I'm hopeful that it will be revisited in the API signature improvements in Moonwalk, but until then mapping Elastic-Api-Version values to requests and responses with matching qualifiers seems sufficient from a docs point of view.

banderror commented 4 months ago

@maximpn Could you please summarize the feedback and the decisions made and close this ticket?

maximpn commented 4 months ago

After discussing the first feedback on documentation bundling approaches with Docs Engineering and Kibana Core teams the following decisions have been made

@lcawl (Docs Engineering team) highlighted that it's an already made decision on their side to have a whole Kibana OAS. Kibana Core team doesn't have objections here. Kibana Core team's effort in this direction is made in https://github.com/elastic/kibana/pull/183338.

responses:
    '200':
      content:
        application/json; Elastic-Api-Version=2023-10-31:
          schema:
            ...