apple / swift-openapi-generator

Generate Swift client and server code from an OpenAPI document.
https://swiftpackageindex.com/apple/swift-openapi-generator/documentation
Apache License 2.0
1.23k stars 89 forks source link

Support external JSON references #25

Open czechboy0 opened 1 year ago

czechboy0 commented 1 year ago

Support external JSON references to allow breaking up a large doc into smaller ones.

"External" can mean either:

  1. another file on local disk
  2. a remote file

To start, let's consider other local files first, as remote files might have other implications (security, privacy, etc).

Fab1n commented 11 months ago

Without this feature this tool is extremely limited. Most real world projects make use of external references. Minimum requirement is local references, remotes would be a nice addon.

czechboy0 commented 11 months ago

To help inform how this should work (with the considerations of sandboxing, security, usability, and so on), can you provide links to some real-world OpenAPI documents that are split across multiple local files? To see which parts are commonly outlined, how duplicates are handled, and so on.

And do you have some opinions on how you'd like it to work?

Thoughts, from anyone, appreciated below, as we first gather requirements, before someone proposes a plan.

andrewse02 commented 11 months ago

Glad to see this issue addressed. Here are some of my thoughts below.

Examples

Unfortunately, I don't have real-world schemas to share, as I cannot share my company's, and I don't have any from personal projects. Here is one example I've come up with that illustrates why this feature, or lack thereof, is so important

With external refs

index.yaml

openapi: 3.0.0
info:
  title: Example API
  version: 1.0.0
paths:
  /users:
    get:
      summary: Get a list of users
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: './schemas/UserList.yaml'
  /users/{id}:
    get:
      summary: Get a user by ID
      parameters:
        - $ref: './parameters/UserId.yaml'
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: './schemas/User.yaml'

schemas/UserList.yaml

type: array
items:
  $ref: './User.yaml'

schemas/User.yaml

type: object
properties:
  id:
    type: string
  name:
    type: string

parameters/UserId.yaml

name: id
in: path
description: User ID
required: true
schema:
  type: string

Without external refs

index.yaml

openapi: 3.0.0
info:
  title: Example API
  version: 1.0.0
paths:
  /users:
    get:
      summary: Get a list of users
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UserList'
  /users/{id}:
    get:
      summary: Get a user by ID
      parameters:
        - $ref: '#/components/parameters/UserId'
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/User'

components:
  schemas:
    User:
      type: object
      properties:
        id:
          type: string
        name:
          type: string
    UserList:
      type: array
      items:
        $ref: '#/components/schemas/User'
  parameters:
    UserId:
      name: id
      in: path
      description: User ID
      required: true
      schema:
        type: string

Results

While this may be a reasonably light example, we still have 350 fewer lines when using external refs in index.yaml. Here are some reasons why writing specs with external refs is practically a necessity.

Modularity and Reusability

External references allow you to define schemas and parameters in separate files that can be reused across multiple API endpoints or even different API specifications. This promotes modularity, reduces duplication, and improves maintainability.

Simplified Main File

By using external references, the main OpenAPI specification file remains concise and focused on defining the high-level structure and behavior of your API. The actual schema and parameter definitions reside in separate files, making the main file easier to read, understand, and update. You would know exactly where to look for everything. Eventually, you would have to call it "The Monolith", speaking from experience, that's really bad.

Easier Collaboration

External references facilitate collaboration among team members working on different aspects of the API. With separate schema and parameter files, different team members can work on those specific components independently without stepping on each other's toes. This leads to smoother collaboration and reduces merge conflicts.

Enhanced Reusability Across APIs

When you have a library of reusable schemas and parameters defined as external files, you can leverage them across multiple API projects. This promotes consistency in data models, validation rules, and API contract definitions, saving development time and effort.

How should this work?

And do you have some opinions on how you'd like it to work?

@czechboy0 I will be honest, I'm not quite sure from what perspective I should answer your question.

User

As a general developer of an API that wants to use this tool in my apps, I wouldn't even want to think about how to use it. I would expect to point to external files and for it to just work. It should be able to be completely seamless. If I had blindly copied a random spec into my project, I wouldn't want to be able to tell which models are external refs or not.

Project contributor

Speaking in terms of this project, I've looked into it on a surface level. From what I can tell (correct me if I'm wrong), we parse the openapi.yaml file, convert its contents to Data, and translate that data. My idea would be to check for external references and replace "$ref" with the contents of the file it's pointing to recursively. Probably easier said than done, I haven't looked that far into it yet. I did also look into TypeAssigner, and noticed that we use OpenAPIKit to check if the reference is internal or external. It's pretty hardcoded to avoid external references. It seems like that's not an OpenAPIKit problem and something we just simply don't support yet.

Conclusion

Overall, this is a much-needed feature. I'd be more than happy to help. I'd love to hear anyone's thoughts on this. If I could get some guidance just as to where to look to get started I'd feel confident in making a PR. Also please let me know if I need to reformat this as a proposal, or create one separately. Thanks!

czechboy0 commented 11 months ago

Thank you @andrewse02 for the details, this is great.

A few questions that come to mind that we'll need to think about:

Let's figure out more of this behavior first, and then we'll see if a proposal would be useful to let others chime in on the design.

andrewse02 commented 11 months ago

Great questions, @czechboy0!

I'm not quite sure how feasible any of this is, these are just my thoughts from a user's perspective.

czechboy0 commented 11 months ago

All of these are feasible, as far as I can tell, so it's really about us choosing how this should work.

I'd like to avoid configuration options if at all possible, and choose a solution that is deterministic and doesn't lose data, so that folks can build conveniences on top, in line with the general design principle of the project.

If later it genuinely seems necessary to support multiple common variants, we can consider a configuration option then, but for the first version, we should be able to do without it.

At this point it might make sense to draft a proposal and discuss the details in a PR. It might also help to try to do a prototype implementation and see what other questions come up.

But don't feel obligated, the information gathered above is already very helpful to whomever ends up implementing it, so all of this is part of the process - and is much appreciated 🙂

andrewse02 commented 11 months ago

I think those are great points. I agree, it would be better to start somewhere and build on top of it later. Maybe introducing so much complexity all at once isn't a good idea. Thanks for that!

I'll start to dive deeper into how we can implement this. I would actually like to create a proposal, so I'll get started on preparing that.

andrewse02 commented 11 months ago

I apologize, it looks like I was incorrect earlier. OpenAPIKit actually does not support external references. See the below function from OpenAPIKit's Components+JSONReference. How should we proceed from here?

    /// - Important: Looking up an external reference (i.e. one that points to another file)
    ///     is not currently supported by OpenAPIKit and will therefore always throw an error.
    ///
    /// - Throws: `ReferenceError.cannotLookupRemoteReference` or
    ///     `MissingReferenceError.referenceMissingOnLookup(name:)` depending
    ///     on whether the reference points to another file or just points to a component in
    ///     the same file that cannot be found in the Components Object.
    public func lookup<ReferenceType: ComponentDictionaryLocatable>(_ reference: JSONReference<ReferenceType>) throws -> ReferenceType {
        guard case .internal = reference else {
            throw ReferenceError.cannotLookupRemoteReference
        }
        guard let value = self[reference] else {
            throw ReferenceError.missingOnLookup(name: reference.name ?? "unnamed", key: ReferenceType.openAPIComponentsKey)
        }
        return value
    }
czechboy0 commented 11 months ago

Let's summon @mattpolzin for thoughts here 🙂

mattpolzin commented 11 months ago

This topic is definitely layered. Some of that was peeled apart above, so I'll do my best to strike a balance between being clear but not repeating things already said.

First, I'll definitely agree that external references are incredibly useful. Where I work we maintain an OpenAPI document across more than a hundred YAML files easily.

Second, I'll point out a dereferencing consideration I saw touched on above: how do you avoid generating distinct types instead of one shared type where a reference would imply the latter behavior. I point this out because if I didn't miss anything the conversation above didn't quite mention something I think is important. Setting aside heuristics this project may use or configurations this project may support to determine when two parts of a document should become the same generated code, the OpenAPI document creator has one really useful tool for specifying this themselves as well: JSON references to the components object. In my eyes, a reference to a shared schema within the components object is at least potentially a clear indication of intent that the schema should remain shared conceptually if a tool generates API resources from it. I'll revisit this later because it plays into pros and cons of strategies I've used to work with OpenAPI documents.

By in large, I've been able to work with OpenAPI Documents by using a tool chain where the OpenAPI document is fully dereferenced early on. The advantage to this is later tools having to deal with less complexity and this has allowed me (and I assume others as well) to use OpenAPIKit very effectively without needing it to follow external references because those references can be inlined by a tool earlier in the tool chain. I won't mention the tools here because there is a plethora of them out there in various ecosystems, many of which are linked to by the OpenAPI website's list of implementations. This may not be as ideal as a one stop shop, but it does give the person using the tooling quite a bit of control, albeit with added effort. Still, I am not opposed to OpenAPIKit supporting external references, it's just quite a step up in complexity so I always figured I would wait for it to be properly motivated. With that, I'll mention one big drawback to dereferencing a document prior to passing it into OpenAPIKit based tooling: if you fully dereference then you lose the component references, whether they were local or not. This defeats attempts by the OpenAPI Document author to define shared schema's explicitly as mentioned above.

Let's say you could specifically combine an OpenAPI Document spread across multiple files but leave local references (e.g. to the components object) in place. Would it be especially more useful to do this within OpenAPIKit as opposed to prior to the OpenAPI Document being processed by OpenAPIKit? Of course it is useful not to need to add prior steps to a toolchain, but does a pipeline approach otherwise limit OpenAPIKit's information about the document (or tools like this code generator)? I've not had the need to look, but I wonder if a tool out there already does this selective inlining of references as a builtin option.

Those are my musings on behavior. My thoughts on implementation start with the fact that currently OpenAPIKit is able to make itself agnostic of the parsing details, letting the project using it decide between e.g. YAML and JSON or which decoder or encoder to use regardless of the file format. That makes it difficult for me to imagine OpenAPIKit using a simple "inline" strategy because OpenAPIKit doesn't work with the JSON or YAML at all so it cannot inline things without worrying about the semantic meaning of those things beyond the structure of the text. That abstraction over parsing details might be worth maintaining if this is brought into OpenAPIKit which makes the API a certain way (probably essentially supported by a function supplied by the project using OpenAPIKit to read new files and provide parsed data from them). I suspect a new decodable type is needed somehow because these individual files are not themselves OpenAPI Documents but rather "fragments" of documents that can only be validated as OpenAPI once combined into a single document. This is where I kind of get stuck in my head and end up walking back my statement that OpenAPIKit should handle this with a new type because it is by far simpler to have never left the JSON or YAML semantics since you really don't want to do anything but treat the individual files as arbitrary parts of the whole Document. [Future self, came back and added a bit of clarity] I may have already realized I was conflating two unrelated things in my head and each individual file possibly should be passable to some existing OpenAPIKit type so perhaps a handler approach is tenable.

Maybe I am overthinking this, though. I'll give it some space and mull on it more.

Here's what I'd want if I could snap my fingers and have it: I'd want to convert external references into internal references to the components object, therefore getting a single OpenAPI Document but retaining an explicit representation of the intention that some part of the document is shared. The components object doesn't support all parts of a document that can be referenced (I think that actually isn't true anymore in 3.1.x but I haven't checked that thoroughly) so you'd still get a mix of things thrown into the components object and things inlined entirely.

Maybe I'm starting to imagine a reasonable way to do this inside OpenAPIKit, though at the same time the approach I am imagining is not easier to do inside OpenAPIKit than outside of it, so it's more a matter of convenience that it be built within. Nothing wrong with that, it's a good argument for doing it inside OpenAPIKit and letting everyone benefit from the implementation.

I'd like to let the conversation here as to what ideally happens to the generated code play out a bit more before solidifying opinions on how OpenAPIKit might help because I don't want to see OpenAPIKit's approach accidentally throw away useful information found in the way that files are used. One example would be the file name of a referenced file -- is that important information for a dereferencing step to hold onto and pass to a code generator?

czechboy0 commented 11 months ago

Thank you @mattpolzin for the thorough explanation – it really resonates with me.

this has allowed me (and I assume others as well) to use OpenAPIKit very effectively without needing it to follow external references because those references can be inlined by a tool earlier in the tool chain

We took the same approach of encouraging composability, explicitly documented here, to allow more flexibility for adopters, and to integrate, rather than replace, other OpenAPI tools (even written in other languages). So any features that can be achieved by modifying the OpenAPI document before passing it to the generator is considered out-of-scope (by default, with us being open to reasonable exceptions) – that's why there are very few configuration options, as the generator aims to maintain the fidelity of the OpenAPI document, without losing information, and allowing adopters to build convenience layers on top.

So if we are to support external references in Swift OpenAPI Generator (presumably through OpenAPIKit) will be influenced by whether we can do so without losing fidelity and throwing away potentially important information for the generated code.

I'll mention one big drawback to dereferencing a document prior to passing it into OpenAPIKit based tooling: if you fully dereference then you lose the component references, whether they were local or not.

💯, staying faithful to the source OpenAPI document is one of the design principles of Swift OpenAPI Generator, which is why it doesn't dereference anything, and instead respects reusable versus inlined types as accurately as possible. Supporting external references shouldn't change that, so any support for external references should, at most, make the OpenAPI document appear to have "copied" all the external references into #/components/..., while preserving the relationships between types. But even that might be throwing away the file boundary, which might be unacceptable. That's why I proposed an approach like this, that would preserve the information about which file a type came from:

/// Gets generated today
enum Operations {
  // ...
}
enum Components {
  enum Schemas {
    // ...
  }
  // Parameters, Headers, Responses, ...
}

/// Could be generated when external references are found.
/// First example shows a schema whose contents are the file my-external-object.yaml itself.
/// Second example shows referencing a schema from another OpenAPI document.
enum External {
  enum Components {
    enum Schemas {
      // - Remark: Generated from ./my-external-object.yaml
      struct MyExternalObject: ... {
        // ...
      }
    }
  }

  // - Remark: Generated from ./shared-api.yaml
  struct Files {
    enum shared_api_yaml {
      enum Components {
        enum Schemas {
          // - Remark: Generated from ./shared-api.yaml#/components/schemas/SharedError
          struct SharedError: ... {
            // ...
          }
        }
      }
    }
  }
}

perhaps a handler approach is tenable

I think so, I imagine the lowest level of resolving external references could be done as a (String) async throws -> Something function, which then could have an implementation for resolving using the local file system, but potentially even from memory in tests (and from the network in the future). I imagine the handler would continue to be agnostic to which JSON/YAML is used, so it'd both get the bytes and parse them into a type-safe OpenAPIKit object.

I don't want to see OpenAPIKit's approach accidentally throw away useful information found in the way that files are used

👍

One example would be the file name of a referenced file -- is that important information for a dereferencing step to hold onto and pass to a code generator?

I think so, as the generator would likely use the file name to inform the Swift type name. So for the generator, more than dereferencing (which isn't used today at all), we're interested in resolving (or, lookup, in OpenAPIKit's terminology) external references, to maintain fidelity.

mattpolzin commented 10 months ago

I've begun work on this feature for OpenAPIKit, but my work so far is very experimental. Although I am still playing with the interface, I wanted to put it out there as soon as possible to encourage feedback before I've implemented most of the necessary machinery.

Please comment on the following draft PR if you have any thoughts about the public API (does it make sense? is it clunky? is it too limiting? etc.): https://github.com/mattpolzin/OpenAPIKit/pull/287

Note that the PR gives an example integration with the new API as a discussion starter so hopefully you won't need to dive too deep into the new code.

Very important to note that most of the machinery is unimplemented so if you play with the new draft PR yourself you should not expect most references to load (yet). I've really just implemented enough to support the example on the tin.

czechboy0 commented 10 months ago

Hi @mattpolzin, thanks for sharing your early progress!

Since we don't use dereferencing in Swift OpenAPI Generator today (only for detecting cycles during a validation phase), I suspect we wouldn't need the externallyDereferenced function (but it totally makes sense, to pair well with the locallyDereferended function).

We'd just detect external references and use something similar to lookup with references today, and load the contents in manually.

That's because we'll likely need to keep track of which files what content came from, and allow adopters to only generate some parts but not all, so that services A and B could share common currency types in C, instead of each getting its own copy of C.

That's a long way to say - I think this design should work for us, but please once you have it working, even if still in a PR, let me know and I'll do a prototype on our side to ensure I haven't missed anything. Thanks! 🙏

czechboy0 commented 10 months ago

@mattpolzin One thing that comes to mind - should the load function be async? To potentially allow network loads?

mattpolzin commented 10 months ago

[...] We'd just detect external references and use something similar to lookup with references today, and load the contents in manually.

@czechboy0 I'd like to drill into this a bit if you don't mind. The current design exploration (as you've noticed) focuses on what I would call "fully loading external references" (i.e. if you externally deference a JSONReference<JSONSchema> then you not only load the referenced JSONSchema but also all external references found within it). What I am understanding you to really want (sticking with the JSONReference<JSONSchema> example) is a way to load in just the JSONReference<JSONSchema> itself which results in a JSONSchema that may still have external references within it.

If I am reading that right, what behavior would you want that "lookup-like" function to include? The bare-minimum would be loading a remote file, parsing it, and deserializing it to a JSONSchema, but this is the work that I was currently thinking I would not handle within OpenAPIKit (by requiring an integration to provide a load<T>(_: URL) throws -> T function). The work that my externallyDereference functions do additionally includes (a) storing the newly loaded JSONSchema in the Components object, (b) turning the JSONReference<JSONSchema> within the document into a local reference to the Components object, and (c) seeking out any nested external references to dereference within the new JSONSchema. Given that (c) does not sound desirable, would you want any of (a) and (b) to be incorporated or do you see other benefits OpenAPIKit could provide you in the process of loading external references? I can imagine a function on the Components type that loaded and stored an external reference (i.e. (a) from above) and returned a local reference that could be used however (i.e. (b) from above). Perhaps loadExternal<Context, T>(_: JSONReference<T>, in context: Context) throws -> JSONReference<T> where Context: ExternalLoaderContext.


[...] we'll likely need to keep track of which files what content came from, and allow adopters to only generate some parts but not all, so that services A and B could share common currency types in C, instead of each getting its own copy of C.

As a separate discussion from the above one, I am curious what my current proposed API lacks when it comes to these goals. Understanding this might allow me to offer a better API. I'll separate the above goals out below so I can ask for specific clarifications.

we'll likely need to keep track of which files what content came from

I was hoping this need would be satisfied by the ExternalLoaderContext protocol's load() function taking a URL as its argument. An external JSONReference ends up just being a Foundation URL, so this argument to the load() function is all the information there is about an external reference prior to loading the referenced file. The nextComponentKey() function allows an integration to determine what to "name" a new component loaded from an external JSONReference based on its type, URL, other existing components, and any other context the integration stores on its type that implements the ExternalLoaderContext protocol. Additionally, an implementation of the load() function could choose to store the URL as a vendor extension (these are supported by all types that can be references if I am not mistaken, though I have yet to implement vendor extensions for JSONSchema). I updated the example on my OpenAPIKit PR to show a load() function setting x-source-url to the URL of an externally referenced object.

allow adopters to only generate some parts but not all

Would it help if the load() function returned an optional value so it could indicate that it chose not to load a particular remote reference by returning nil?

services A and B could share common currency types in C

If an ExternalLoaderContext implementation has a nextComponentKey() function that returns the same value for the same URL each time it is called, loading the same external reference from two different parts of the document will result in one entry in the Components object and multiple new local references to that same entry in the Components. Is this roughly the functionality you are describing or are you getting at something a bit different?

mattpolzin commented 10 months ago

One thing that comes to mind - should the load function be async? To potentially allow network loads?

My understanding of async/await in Swift is not great since I stopped working with Swift at my day-job prior to its introduction, but because OpenAPIKit supports back to Swift 5.1 currently I believe my APIs for loading external references would need to all be implemented twice (once as async and once not) using compiler & OS version conditions in order to support async for the load() function. At least, when I played with the idea, that seemed to be inescapable.

My plan is to bump the minimum Swift version of OpenAPIKit with the next release (4.0.0) rather than the current release (3.0.0) because I don't want to just use modern Swift syntax in one small part of the codebase -- I want to make a concerted effort to update the codebase everywhere newer Swift versions have made nicer code possible.

Version 4.0.0 will likely come fast on the heels of 3.0.0, though, unlike version 3.0.0 which has been a long time coming since OpenAPI v3.1 is such a beast to implement fully.

czechboy0 commented 10 months ago

Thanks Matt, I'll need to prototype external ref support in order to be able to answer your questions above 🙂 I'll get back to you once I've done so, but might be a few weeks.

mattpolzin commented 10 months ago

No worries and no rush. Cheers!

mattpolzin commented 1 month ago

I've made substantial progress on loading external references as of the latest v4 alpha of OpenAPIKit. I've only merged support for it in OpenAPKit so far (OpenAPIKit30 is planned next). Importantly, this is not a feature where you can load an OAS 3.0 document, convert it to OAS 3.1 and then externally dereference (because an OAS 3.1 document expects to decode OAS 3.1 objects from the other file), so OpenAPIKit30 module support for external dereferencing will be very important.

mattpolzin commented 1 month ago

v4-alpha.5 is out. Now external dereferencing is supported by the OpenAPIKit30 module in addition to the OpenAPIKit module. Perhaps more importantly, both modules of OpenAPIKit have a more flexible ExternalLoader interface that allows the load() function to pass messages back to whatever call site is dereferencing the Document (or some component within it). The implementation supports async and the example ExternalLoader in the README should give some hints as to how one might extract useful information from the process of external loading for later.

czechboy0 commented 1 month ago

Thanks so much @mattpolzin, we'll take a look 🙏