Generate initial business logic/database types for a list of schemas

czechboy0 commented 10 hours ago

Motivation

When writing a client or a server, in addition to the API (defined in the OpenAPI document), oftentimes we want to use similar (or initially even identical) types in a) our business logic and b) storage/database.

For example, consider the following note-taking service we'd like to implement (would be similar for a client, the "database" just might be a local cache):

openapi: 3.0.3
info:
  title: Notes Service
  version: 1.0.0
paths:
  /notes:
    post:
      operationId: createNote
      description: Create a new note
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/NoteInputs'
      responses:
        '201':
          description: A new note was successfully created.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Note'
components:
  schemas:
    NoteInputs:
      type: object
      properties:
        text:
          type: string
      required:
        - text
    Note:
      type: object
      properties:
        id:
          type: string
          format: uuid
        text: 
          type: string
      required:
        - id
        - text

It has a single operation "createNote", which sends the POST /notes request containing the text of the note in the request payload, and receives a fully formed note, which includes a unique identifier generated by the server and the text itself.

This is a very common pattern for services, where some schemas are exclusively sent from clients to the server (NoteInputs in our example), and other schemas are exclusively sent from the server to its clients (Note, in our example).

Assume our service wants to process the incoming note inputs before saving the new note to the database. And in turn, when retrieving a note from the database, it wants to process it before returning it over the API.

[new note creation]: API note inputs -> note inputs model -> note inputs database object
[note retrieval]: API note <- note model <- note database object

In total, there can be 6 distinct Swift types to represent a service resource, and let's give them Swift type names:

APINoteInputs: API note inputs (NoteInputs in our OpenAPI doc, generated by Swift OpenAPI Generator as Components.Schemas.NoteInputs, but let's use the name APINoteInputs as a typealias here, instantiated from the request JSON)
ModelNoteInputs: note inputs model (hand-written by developer, instantiated from APINoteInputs on the request path)
DBNoteInputs: note input database object (hand-written by developer, instantiated from ModelNoteInputs on the request path, serialized into the database)
DBNote: note database object (hand-written by developer, deserialized from the database on the response path, sometimes DBNote and DBNoteInputs can be represented by the same Swift type, but we keep them separate here to work through the fully general case)
ModelNote: note model (hand-written by developer, instantiated from DBNote on the response path)
APINote: API note (Note in our OpenAPI doc, generated by Swift OpenAPI Generator as Components.Schemas.Note, but let's use the name APINote as a typealias here, instantiated from ModelNote on the response path and serialized as JSON)

Note: The above separation in 3 layers (API, model, DB objects) is important for flexibility and the ability to evolve each tier separately. The API should be able to change without rewriting the business logic using the model or changing the database schema, for example. The database schema should be able to be migrated without changing the API, and so on. Simple services might be able to get away with fewer layers, by coupling some of them, but I'd consider that an anti-pattern that ends up being more costly in the long term. However, it's attractive when creating a new service, as it speeds up the bringup - and this issue proposes making that less time-consuming while maintaining the 3-layer approach (best of both worlds).

From the 6 Swift types we need, 2 are generated and kept in-sync automatically with the OpenAPI doc (APINoteInputs and APINote), while 3 are hand-written and kept in sync manually (ModelNoteInputs, DBNoteInputs, DBNote, ModelNote). All 6 have some downstream impact when they change, for example changing DBNote or DBNoteInputs probably requires a database migration, and so on.

Further, to focus on how data flows between these Swift types, let's list the required transformations:

T1: JSON -> APINoteInputs
T2: APINoteInputs -> ModelNoteInputs
T3: ModelNoteInputs -> DBNoteInputs
T4: DBNoteInputs -> DB
T5: DB -> DBNote
T6: DBNote -> ModelNote
T7: ModelNote -> APINote
T8: APINote -> JSON

Note: Generally, we can skip implementing transformations in the opposite direction, for example we don't need to be able to create APINoteInputs from ModelNoteInputs.

T1 and T8 come from Decodable and Encodable conformances of code generated from the OpenAPI document, however T2-T7 all have to be hand-written today - which is a lot of work, which might appear as boilerplate, especially when getting started. (Once again, the value of the separation becomes obvious once the service growths, the layers evolve independently, and more business logic is written to work with the models. But when bringing up a new service from just the OpenAPI document, it can be tedious.) Also, T4 and T5 might be handled by the database library automatically using custom annotations in the form of property wrappers or macros, so we won't discuss them much any further.

The question thus is - when getting started, what can we do to help with creation of:

types: ModelNoteInputs, DBNoteInputs, DBNote, ModelNote
initializers on types: T2, T3, T6, T7

Proposed solution

This issue is meant to start a larger discussion about this topic, and feel free to suggest (even wildly) different approaches here. Below is the current idea I've been thinking about, which we can use and evolve or abandon if a better one comes along. This solution assumes we are in agreement that the Motivation is sound, that the problem exists, and that it's large enough to be worth solving. (If you disagree with that, please elaborate in replies first, it doesn't make sense for us to debate solutions until we agree on the problem statement 🙂.)

TL;DR: Run a one-time code-generation step that emits the 4 types and 4 methods required above.

Note: Whether this step is part of the swift-openapi-generator or if it should be implemented as external plugins is to be discussed here, I don't have a strong opinion on that yet. Though my preference would be to keep it in separate repos, to continue the open-ended approach we've established with the transport and middleware abstractions.

One important detail here is that it'd be one-time, and unlike the Swift OpenAPI Generator plugin (which runs every time the OpenAPI document changes and continually keeps the generated API types up-to-date), this generation step would be run at most once in the lifecycle of a service.

The reason for that is that the 4 types and 4 transformations should be maintained by hand as the app evolves, this code-generation step would just get you started with the initial OpenAPI document, to be able to start writing business logic within minutes, instead of first creating these 3 layers by hand, which can be time-consuming if you're implementing a service for an OpenAPI document with many operations and types already (this is encouraged in spec-driven development, to design the initial API together with stakeholders, before any code is written, either on the client or server, and then iterate).

For example, we could generate code like this:

struct ModelNoteInputs {
    var text: String

    // T2
    init(_ api: APINoteInputs) {
        text = api.text
    }
}

struct DBNoteInputs {
    var text: String

    // T3
    init(_ model: ModelNoteInputs) {
        text = model.text
    }
}

struct DBNote {
    var id: String
    var text: String
}

struct ModelNote {
    var id: String
    var text: String

    // T6
    init(_ db: DBNote) {
        id = db.id
        text = db.text
    }
}

extension APINote {

    // T7
    init(_ model: ModelNote) {
        id = model.id
        text = model.text
    }
}

This way, we could potentially generate a whole "starter package" given only the OpenAPI document (the developer would just need to fill in the database, and even that code could be generated for some popular database frameworks).

And again, this would be run once, and the generated code would be checked in and maintained by hand from this point on.

What do y'all think? Is there something here?

Alternatives considered

We could let developers hand-write this code, unfortunately that presents them with two not-great options:

Don't create the 3-layers right away, save time in the short term, but probably violate the layering as e.g. you write business logic using the database type, or even worse, you couple your database schema with your API, not allowing each to evolve independently.
Put in potentially a significant amount of time just to get started, before they get to write the fun part (the business logic of their service).

Additional information

Let's discuss here to start, if there's interest parts of this could be turned into proposals. If anyone is passionate about this area and would like to drive this effort, let us know in the replies! 🙏

simonjbeaumont commented 8 hours ago

Thanks for the writeup @czechboy0! My initial thoughts below:

Note: The above separation in 3 layers (API, model, DB objects) is important for flexibility and the ability to evolve each tier separately. The API should be able to change without rewriting the business logic using the model or changing the database schema, for example. The database schema should be able to be migrated without changing the API, and so on. Simple services might be able to get away with fewer layers, by coupling some of them, but I'd consider that an anti-pattern that ends up being more costly in the long term.

Strongly agree.

However, it's attractive when creating a new service, as it speeds up the bringup - and this issue proposes making that less time-consuming while maintaining the 3-layer approach (best of both worlds).

My concern is that by providing a solution here we are introducing coupling of a kind. While it's true that the schemas in an OpenAPI document can be used to generate types for arbitrary use, the OpenAPI document has likely been written with the API in mind, and types designed accordingly. It's possible that these types aren't the best fit for persistence at all.

From the 6 Swift types we need, 2 are generated and kept in-sync automatically with the OpenAPI doc (APINoteInputs and APINote), while 3 are hand-written and kept in sync manually (ModelNoteInputs, DBNoteInputs, DBNote, ModelNote). All 6 have some downstream impact when they change, for example changing DBNote or DBNoteInputs probably requires a database migration, and so on.

Right, but that's going to need to be the case for any service that's evolving its API and has persistent data.

T1 and T8 come from Decodable and Encodable conformances of code generated from the OpenAPI document, however T2-T7 all have to be hand-written today - which is a lot of work, which might appear as boilerplate, especially when getting started. (Once again, the value of the separation becomes obvious once the service growths, the layers evolve independently, and more business logic is written to work with the models. But when bringing up a new service from just the OpenAPI document, it can be tedious.) Also, T4 and T5 might be handled by the database library automatically using custom annotations in the form of property wrappers or macros, so we won't discuss them much any further.

The question thus is - when getting started, what can we do to help with creation of:

types: ModelNoteInputs, DBNoteInputs, DBNote, ModelNote

initializers on types: T2, T3, T6, T7

I like that you've once again drawn a distinction between "getting started" and "the value of the separation" for a finished implementation. My current position is that, if all you're doing is getting started you just don't need to be so strict, which you suggested elsewhere in the motivation.

This issue is meant to start a larger discussion about this topic, and feel free to suggest (even wildly) different approaches here. Below is the current idea I've been thinking about, which we can use and evolve or abandon if a better one comes along. This solution assumes we are in agreement that the Motivation is sound, that the problem exists, and that it's large enough to be worth solving. (If you disagree with that, please elaborate in replies first, it doesn't make sense for us to debate solutions until we agree on the problem statement 🙂.)

Yeah, I think we should definitely discuss the motivation and requirements here before we start looking at any solution.

Alternatives considered

We could let developers hand-write this code, unfortunately that presents them with two not-great options:

Don't create the 3-layers right away, save time in the short term, but probably violate the layering as e.g. you write business logic using the database type, or even worse, you couple your database schema with your API, not allowing each to evolve independently.

Put in potentially a significant amount of time just to get started, before they get to write the fun part (the business logic of their service).

I'll start by saying that I think these aren't that bad. IMO you've presumed an outcome from (1) which I don't think is fair. Even outside of this domain, prototyping something often includes doing the simplest thing and doing it "properly" later. You're presumed that people won't do that, which is a reach.

My summary to this point is that I'm not yet convinced by the motivation. However, the presentation of a solution may have biased me, because I'm even more suspect on the solution. I really dislike the idea that it's not idempotent generation.

OpenAPI provides a great way to define HTTP APIs and Swift OpenAPI strives to provide value here.

I do concede that there is a very analogous problem with database persistence and I'm sympathetic to the needs for adopters to have a good end-to-end experience building a service. But I'd like to make sure that we continue to provide a composable ecosystem of things that do their job well. How much precedent is there in the OpenAPI ecosystem is there for using the spec in this way?

I do concede that the JSONSchema types used in the #/components/schemas section can be an attractive single source of truth and that it's an acceptable use case to use these to generate just the types. This is a happy side-effect of supporting the (only) types configuration. We've also deliberately structured the project to decouple the types, the universal HTTP client and servers, and the client and server transports.

If we were to consider doing something in this space, maybe we could extend this layering further where we have the idea of a "database transport" that can take care of putting types into databases. I could see folks structuring a project with a separate types for their storage, which could even be in a distinct file if we supported OpenAPI cross-file references. Another solution could be a different mode for configuration, which could be idempotent.

czechboy0 commented 7 hours ago

But I'd like to make sure that we continue to provide a composable ecosystem of things that do their job well. How much precedent is there in the OpenAPI ecosystem is there for using the spec in this way?

Thanks for your thoughts, @simonjbeaumont, I most strongly resonate with your point about whether this is strictly tied to OpenAPI - after thinking about this more, I think the answer is "no". I hadn't realized before that we can achieve all this outside of the OpenAPI generation step, by using swift-syntax to iterate over the list of types in the generated Types.swift (in #/components/schemas).

If we were to consider doing something in this space, maybe we could extend this layering further where we have the idea of a "database transport" that can take care of putting types into databases. I could see folks structuring a project with a separate types for their storage, which could even be in a distinct file if we supported OpenAPI cross-file references. Another solution could be a different mode for configuration, which could be idempotent.

Yes that'd work great if we expected adopters to keep their database types in sync with their API types over time. But I'm suggesting only starting with them, but then letting them diverge. So when it comes to maintaining your model/DB types, I'm suggesting that always happens by hand. The only possible improvement here would be who/how writes the initial form of the model/DB types.

And while I still believe the model/DB types being able to just start off matching the API types, I now believe this is the wrong project to try to solve this, so I'll close this and we should see if folks in the community can think of a better place to solve this.

czechboy0 commented 7 hours ago

I suspect the Swift Forums might be a better place to discuss this more, if anyone lands on this issue in the future.

simonjbeaumont commented 4 hours ago

I suspect the Swift Forums might be a better place to discuss this more, if anyone lands on this issue in the future.

If you create a Forums thread, could you post the link back for folks who land here?

apple / swift-openapi-generator