Apicurio / apicurio-registry

An API/Schema registry - stores APIs and Schemas.
https://www.apicur.io/registry/
Apache License 2.0
606 stars 269 forks source link

Documentation tab makes unauthenticated requests for referenced schemas #3369

Open forsberg opened 1 year ago

forsberg commented 1 year ago

Description

Registry Version: 2.4.2.Final Persistence type: sql

Environment

Running Apicurion in Kubernetes behind ingress controller. Apicurio configured with oidc authentication with Keycloak as IdP. Relevant environment vars:

Steps to Reproduce

  1. Upload Avro Artifact to test-ref group. I used this example artifact:
{
    "type": "record",
    "name": "SomeRecord",
    "namespace": "com.example",
    "doc": "Yaba, daba, doo",
    "fields": [
        {
            "name": "field",
            "type": "string",
            "doc": "A string field"
        }
    ]
}
  1. Upload AsyncAPI artifact that references the previously uploaded Avro artifact with a $ref, for example this:
{
  "asyncapi": "2.4.0",
  "id": "urn:com.example:test-ref",
  "info": {
    "title": "Hello world With Reference",
    "version": "0.2.0"
  },
  "channels": {
    "hello": {
      "publish": {
        "message": {
          "payload": {
            "$ref": "https://apicurio-registry.example.com/apis/registry/v2/groups/test-ref/artifacts/avro/versions/1"
          }
        }
      }
    }
  }
}

Expected vs Actual Behaviour

Expected to see nicely formatted AsyncAPI documentation with payload being expandable under the Documentation tab.

Instead, under Documentation, I get a blank page with the error message

Error: Error downloading https://apicurio-registry.example.com/apis/registry/v2/groups/test-ref/artifacts/avro/versions/1 HTTP ERROR 401

Trying this on a test apicurio instance without authentication, it works like a charm.

Inspecting the HTTP requests made, there's no Authorization header being sent in the request for the Avro artifact.

EricWittmann commented 1 year ago

Agree that this is a problem. However, I think there are a couple of different solutions/approaches.

First, I don't think the $ref in the AsyncAPI should actually point to the registry location. Instead, I think it should be a local file reference or some sort of logical reference. And then when uploading the AsyncAPI document you should provide the artifact reference mapping (mapping from the logical $ref value to artifact coordinates) using the registry's references feature.

We've made some changes to registry for this next release (TBD today or tomorrow) that should make this easier if you are using our maven plugin to register the artifacts. But we recognize that the artifact references feature is harder to use than it could be and we'll continue improving it. But it's the right thing to do IMO, rather than polluting the content of your artifacts with absolute registry URLs.

Next, getting the documentation tab to work once you've done the above. :) Here I think there are two options we need to explore:

  1. De-reference the content. This basically turns multiple artifacts into a single big artifact so that downstream tools can more easily handle them. I think this is the right thing to do for this use-case.
  2. Customize the AsyncAPI (and OpenAPI) visualizers with custom $ref resolvers that can properly resolve the reference and load it (with auth) from the correct registry coordinates.

I think (2) would be the right approach if the visualizers supported custom resolvers. I'll need to look into that. But I know that (1) will work. :)

EricWittmann commented 1 year ago

Now it occurs to me that (1) is problematic when the reference is to an Avro schema. I'm not sure those can be dereferenced in this way. So (2) might be the only option. Perhaps I will ask how this can be done in the AsyncAPI repo...

EricWittmann commented 1 year ago

https://github.com/asyncapi/asyncapi-react/issues/731

They do have a way to add authentication information to the request already - so leveraging that would result in a fix for your current use-case, @forsberg

forsberg commented 1 year ago

It's unclear to me if there is a way to reference Avro schemas using the registry's reference feature in current releases? According to Redhat Documentation, it's only available for Avro, Protobuf and JSON Schemas?

If it's possible, I would be happy to use it. In my git repository, the references to Avro schemas are by relative path in the filesystem, and I then transform that to URLs before pushing AsyncAPI to the registry. I could instead transform the $ref into a format Apicurio understands. But I don't know if this would work much better when it comes to documentation rendering?

forsberg commented 1 year ago

OK, I see now in Apicurio docs that I can supply references if I use a custom content-type. I guess that should work regardless of type of Schema.

Will try that out, but it will probably still not solve this problem. It will help me in other ways, though.

EricWittmann commented 1 year ago

It will not solve your problem, but it's the "right" way to do it. But we'll also need to improve the handling of those refs in the Documentation tab (and other similar use-cases). Without support for customizable $ref handling in the AsyncAPI react component, we would need to come up with alternatives. I have some ideas, but will need to play around a bit.

EricWittmann commented 1 year ago

Hey @forsberg I've started working on this issue. My approach for now is to support Artifact References - I'm adding a new option to automatically translate any $ref from a logical reference to an Apicurio Registry REST API URL. So if your AsyncAPI content looks like this:

{
  "asyncapi": "2.4.0",
  "id": "urn:com.example:test-ref",
  "info": {
    "title": "Hello world With Reference",
    "version": "0.2.0"
  },
  "channels": {
    "hello": {
      "publish": {
        "message": {
          "payload": {
            "$ref": "./SomeRecord.avsc"
          }
        }
      }
    }
  }
}

But you have an artifact reference configured for ./SomeRecord.avsc that points to GAV test-ref/SomeRecord/1 then the server will (optionally) return modified content like this:

{
  "asyncapi": "2.4.0",
  "id": "urn:com.example:test-ref",
  "info": {
    "title": "Hello world With Reference",
    "version": "0.2.0"
  },
  "channels": {
    "hello": {
      "publish": {
        "message": {
          "payload": {
            "$ref": "http://localhost:8080/apis/registry/v2/groups/test-ref/artifacts/SomeRecord/versions/1"
          }
        }
      }
    }
  }
}

I have an early exploration of this working, and the AsyncAPI docs generator appears to be loading the content properly (I have auth disabled so I'll need to come back to that later). However, the AsyncAPI doc component fails on validation of the example you included above:

image

Does that specific example work for you in some other context? Or do you have a fully working example I can test with?

EricWittmann commented 1 year ago

Also I wonder if you have tested any Avro schemas that use types from other schemas. In other words, AsyncAPI -> Avro Schema 1 -> Avro Schema 2. I'm not sure how the AsyncAPI doc component would ever know how to handle that, under any circumstance.

forsberg commented 1 year ago

I'm in a bit of a hurry, so only some cursory answers, hopefully I can come back with more tomorrow.

About the design with schema references:

I think that makes sense, but please make it's possible to refer to schemas in different artifact groups, not only the same artifact group as the AsyncAPI artifact.

Although we haven't decided on exact structure yet, I think we'll end up with most Avro schemas in one group, and the AsyncAPIs in several other groups.

About the validation error

I started seeing that validation error on the latest version of Apicurio Registry, the version before it did not complain. Perhaps I have provided you with a syntactically incorrect example? I just took something off the AsyncAPI blog to provide you with an example on what I meant. Will have a look tomorrow to see if I can provide you with a better example.

About Avro schemas referring to Avro schemas

No, haven't tested that, but we're actively thinking about using that capability. Perhaps the way forward is to make the dereferencing functionality (https://github.com/Apicurio/apicurio-registry/issues/2865) of Apicurio actually work, i.e have the schema registry expand all the referred Avro schemas into one big Avro schema, and let the $ref link point to the dereferenced version.

EricWittmann commented 1 year ago

About the design with schema references:

I think that makes sense, but please make it's possible to refer to schemas in different artifact groups, not only the same artifact group as the AsyncAPI artifact.

Don't worry - artifact references work across groups just fine.

About the validation error

I started seeing that validation error on the latest version of Apicurio Registry, the version before it did not complain. Perhaps I have provided you with a syntactically incorrect example? I just took something off the AsyncAPI blog to provide you with an example on what I meant. Will have a look tomorrow to see if I can provide you with a better example.

If you have one, that would be helpful. I'm surprised by the validation errors, and I can spend some time digging into the AsyncAPI spec to figure out exactly how it should look (they might have a working example in the AsyncAPI GH org) - I was just hoping you might have something working that you could contribute. If not don't worry about it.

About Avro schemas referring to Avro schemas

No, haven't tested that, but we're actively thinking about using that capability. Perhaps the way forward is to make the dereferencing functionality (#2865) of Apicurio actually work, i.e have the schema registry expand all the referred Avro schemas into one big Avro schema, and let the $ref link point to the dereferenced version.

This was my only solution as well. I'm planning on working on that as part of this effort. Dereferencing should work pretty well for some types like Avro and OpenAPI, but poorly or not at all for others like Protobuf. Will know more as this progresses.

EricWittmann commented 1 year ago

OK good news and bad news on this. I started on this a couple weeks ago and immediately went into a bit of a rat-hole getting our Maven plugin to automatically register AsyncAPI artifacts with references. That took me a bit of time but I got it sorted out. That was just because I wanted an easy way to test this the way I think it should ideally work (see my arguments above).

After that, we needed a way to automatically rewrite the content of an artifact to change whatever the values of the $ref properties are in the actual content, to the registry REST API Urls they should be for proper external resolution. So I updated the REST API to replace the existing (never implemented) dereference query parameter that some of our API calls had to a new references query parameter (with possible values DEREFERENCE, REWRITE, or PRESERVE). Note that PRESERVE is the default. This allows the Documentation tab to request the content with references "rewritten" using valid REST API URLs. This is as I described above in https://github.com/Apicurio/apicurio-registry/issues/3369#issuecomment-1599402743

So both of those are done, which means you can use the Maven plugin to register local artifacts which contain references to other local artifacts using local file path references. For example, you could have an AsyncAPI document like this:

{
    "asyncapi": "2.5.0",
    "channels": {
        "hello_world": {
            "publish": {
                "message": {
                    "$ref": "common/cmn-messages.json#/components/messages/HelloWorld"
                }
            }
        }
    }
}

That has an external reference to a local file called common/cmn-messages.json which contains the HelloWorld message definition. The maven plugin will figure that out and register both the above artifact and the cmn-messages.json file (if it exists locally) and it will automatically configure the Artifact Reference mapping for the above artifact. The artifact reference mapping in this case would be something like:

common/cmn-messages.json#/components/messages/HelloWorld --> default/cmn-messages.json/1 (group, artifact-id, version)

Once that is done, you can then make a REST API call to this endpoint:

http://localhost:8080/apis/registry/v1/groups/default/artifacts/example-async-api/versions/1?references=REWRITE

The result of that will be:

{
    "asyncapi": "2.5.0",
    "channels": {
        "hello_world": {
            "publish": {
                "message": {
                    "$ref": "http://localhost:8080/apis/registry/v1/groups/default/artifacts/cmn-messages.json/versions/1?references=REWRITE#/components/messages/HelloWorld"
                }
            }
        }
    }
}

Notice that the value of the $ref was rewritten from a local/logical path to a Registry URL. This allows downstream tooling to actually make a real fetch() call to get that content.

However, that just leads us to @forsberg 's actual problem in this ticket, which is that the above will fail with an authentication error (the URL above will require credentials when Auth is enabled in Registry).

So finally, I was able to get the appropriate headers (and enable CORS) injected using configuration options available in the React component provided by AsyncAPI.

Note: this potentially opens up an interesting information leak potential. If the content of the artifact contains an absolute URL for a $ref and that specific ref is not mapped using the Registry's Artifact References feature, the user's auth token will be included in the request to that random external server. This has the effect of leaking the user's auth token, which is obviously very bad. Users can enable the Referential Integrity rule in Registry to disallow unmapped references, which will take care of this. But that is probably not enough - we need to be more secure by default I think. I'm open to ideas on this point.

In any case, I do have everything working for AsyncAPI (although I haven't tested with Avro schemas yet). I'll get everything cleaned up and some PRs opened for review.

Happy for any feedback!

Note 1: @forsberg I do believe that using artifact references for what you're trying to accomplish is (at least in the long term) a much better approach than changing the references in your actual content to reflect Registry locations. Of course, if the Registry is actually the source of truth for that content, then that's different. But if that's the case, I would expect the "Anonymous read access" feature of Registry to be enabled, in which case this entire issue is moot. :)

Note 2: I have not found a way to get this same functionality working with the OpenAPI Documentation component (we use ReDoc for that). I haven't really dug into it yet though - so there may be something we can do. If not I will explore contributing such functionality.

Note 3: Interestingly this "rewrite" approach works well for AsyncAPI, OpenAPI, and JSON Schema but not for Avro or (I suspect) Protobuf. I think for Avro we'll need to dereference/inline the content, and no idea what to do about Proto yet. Because AsyncAPI can reference Avro, and then Avro could reference another Avro, we may need to add a "mixed" mode whereby the choice between REWRITE and DEREFERENCE is made (by the server) depending on the artifact type.

forsberg commented 1 year ago

Sound great! So, how's the cleaning up and PR work going? :-)

EricWittmann commented 1 year ago

I'm just back at work today after about 4 weeks off (unexpected). But I stalled out on this because the security leak issue I described above is a blocker. Everything I wrote about that is already implemented and committed. So you could start using artifact references now and they should work as described above. But you'll still have the problem that the UI documentation tab will fail with an auth error (unless you enable Anonymous Read Only access).

There are really only two solutions to this that I can think of:

  1. Implement the DEREFERENCE option for AsyncAPI/OpenAPI. This will result in the "inlining" of all referenced schemas, so there is no need to make additional fetch() calls on the part of the UI component.
  2. Add support for custom resource fetching to both the AsyncAPI and OpenAPI documentation components.

The problem with (1) is that I don't think it will support the AsyncAPI + Avro use-case, since I don't think you can inline an Avro schema in an AsyncAPI document. I'd love confirmation of that fact though.

The problem with (2) is that it obviously requires me to contribute functionality to upstream projects that I am not currently a contributor to (the AsyncAPI react component and the ReDoc project).

And finally, I have no idea what will happen if the AsyncAPI document refers to an Avro schema that itself depends on one or more additional Avro schema. That's still a big unknown.

EricWittmann commented 1 year ago

To be clear: I can't commit the work I've done on enabling authentication support in the AsyncAPI React component because it introduces an unacceptable security token leak, as discussed above.

forsberg commented 1 year ago

Re "Secure by default" - I agree that's a good principle.

Random idea: Let UI code check if referential integrity is enabled, and only then enable fetching references?

forsberg commented 1 year ago

Or stricter: Only allow ?references=REWRITE in API calls if referential integrity is enabled. Would require storing that referential integrity was enabled when artifact version was written.

EricWittmann commented 1 year ago

Interesting ideas for sure. The part about requiring that we store the fact that the RI rule was enabled when the version was written is particularly interesting. Definitely would be required for this case. I wonder if that's a general enhancement that might be useful: recording which rules were enabled when a version is written (and the config for those rules).