RFC: A tool that helps enforce consistency & rigor in responses

atuttle commented 2 years ago

A problem I've seen consistently throughout my time using Taffy and helping others is that we all —myself included— struggle to manually make our API responses consistent with themselves. E.g. /customers/1 and /customers both return customer data, but one might use firstName while the other uses first_name, because CFML's weak typing gives us plenty of opportunities to do the wrong thing.

I do not want to create something that feels like an ORM.

But it would be really nice to have something like this:

var customers = queryExecute("select firstname, lastname, email, id from customers where active = 1");

return rep(
    queryToArrayOf( 
        data: customers,
        type: types.customer,
        fields: [ "firstName", "lastName", "email", "id" ],
        callback: function(record){
            //you could manually massage records in here like you currently can with queryToArray
            return record;
        }
    )
);

👆🏻This assumes some sort of "Customer" type already exists, and understands how to pull the appropriate data from the query.

That's about as far as I've been able to develop this thought in my head so far. I will have the availability to spend time on this effort at work in the coming months, so now is a great time to get your ideas and feedback in so that they can be considered for the effort.

The proposal above should also work reasonably well for cases where you've got nested types.

var customers = queryExecute("select firstname, lastname, email, id from customers where active = 1");
var orders = queryExecute("
    select id, orderDateTime, itemCount, orderTotalCents 
    from orders 
    where customerId in (:idList)
"), {
    idList: { cfsqltype: "numeric", list: true, value: valueArray( customers.id ) }
});

return rep(
    queryToArrayOf( 
        data: customers,
        type: types.customer,
        fields: [ "firstName", "lastName", "email", "id" ],
        callback: function(record){
            record['orders'] = queryToArrayOf(
                data: 
                    // use QoQ to pull each customer's orders from the shared order lookup
                    queryExecute(
                        "select * from orders where customerId = :id", 
                        { id: { cfsqltype: "numeric", value: record.id } },
                        { dbtype: "query" }
                    ),
                type: types.order,
                fields: [ "orderDateTime", "itemCount", "orderTotalCents" ]
            );
            return record;
        }
    )
);

We may not need the fields argument. Each type should be aware of all possible fields it would contain, and capable of loading them from queries by column name. If a column is missing from the query, it should be excluded from the result object.

The returned objects would be simple structs, with the properties defined by the types. The type class would enforce key name CaSe sEnSiTiViTy. So if you don't want a key in the result, don't select it in the query. I guess there might be a use case where you'll want to select more columns than you want in the result, so we'll make the fields argument optional.

You might have special handling you want to do for certain columns. E.g. if the column stores json data and you want it represented as real objects in the result, your implementation might look like this:

function parseRow( row ) {
    row['metadata'] = deserializeJson( row.metadata );
}

Given that you can specify custom functionality like this at the type-class level, do we need to leave leeway for you to add/override column type handlers at the resource-implementation level?

var customers = queryExecute("select firstname, lastname, email, id from customers where active = 1");

return rep(
    queryToArrayOf( 
        data: customers,
        type: types.customer,
        fields: [ "firstName", "lastName", "email", "id" ],
        customize: {
            email: function(e) { return lcase( arguments.e ); }
        }
    )
);

In addition to enforcing consistency in our outputs, other benefits of defining your types separately would include:

Ability to generate a data dictionary, so your consumers could know that a customer first name is a string with a max length of 100 characters
Maybe we could use this to enforce input formatting, if you're accepting complex inputs?
Can you think of more?

This is where you come in

What are your thoughts after reading the above?
What ideas do you have to make this even better?
What obvious flaws have I overlooked?
What use-cases won't this approach work for?

pfreitag commented 2 years ago

Why not utilize Swagger / JSON Schema?

timmixell commented 2 years ago

One of the things that originally drew me to Taffy was the absence of anything trying to solve my non-API problems for me. Why is this Taffy's problem to solve? Can't onTaffyRequestEnd be leveraged as an opportunity to adapt a response in case of poor interface-planning?

atuttle commented 2 years ago

@pfreitag: Why not utilize Swagger / JSON Schema?

I'm not sure that solves the problem of multiple resources that return the same data types (e.g. customers, above) with different formatting. If I'm wrong, please elaborate.

@timmixell: One of the things that originally drew me to Taffy was the absence of anything trying to solve my non-API problems for me. Why is this Taffy's problem to solve? Can't onTaffyRequestEnd be leveraged as an opportunity to adapt a response in case of poor interface-planning?

I did consider making this a separate project, actually. And maybe it still will be. But since it's inspired by Taffy usage, I figured we'd start here.

The problem I'm attempting to solve is that it's too easy for developers to be inconsistent with their former decisions. I don't want to have to track down every time I previously returned customer data and make sure I use the same object property names (and CaSe)... But I might need to add a new customer resource a year or more after the last time I worked on them. And the DB column names may not always exactly match what we want to be in the response.

And maybe for that reason ☝🏻 it should stay as part of Taffy. The only time it really matters that object key names are consistent is when you're pushing data outside the edges of your app (e.g. an API). Maybe it would be useful for generating files to export, too, but otherwise, I don't see any utility beyond API's. Am I wrong?

So, instead, if my past self had written a Customer type that was aware of all of its (current at that time) columns and was capable of transforming rows from a query selecting from the customers table, then I know I can simply select from the customers table for my new resource and pass it off to the Customer type and know that it will be consistent.

Likewise, if my changes also involve adding a new column to the DB, I can add that to the type and the response would be consistent any time the developer used it.

This also enables generation of a data dictionary (which is maybe why @pfreitag suggested swagger/json-schema?), but my experiences with those things has been universally negative. Firstly, they're so complex and bloated that any customer I try to hand it to and say, "here, my responses are defined by this document" is going to take 1 look at it and know instantly that they'll never look at it again.

pfreitag commented 2 years ago

In JSON Schema / Swagger / OpenAPI you can define a type and then say the endpoint returns a type. Swagger can actually generate nice HTML docs that you can share rather than the JSON file (example: https://petstore.swagger.io/), and you can use it to validate server side as well.

Here's an example of referencing a definition "$ref": "#/definitions/User" any time you return user you can just specify that it returns the User type:

{
  "paths": {
    "/user/{username}": {
      "get": {
        "summary": "Get user by user name",
        "description": "",
        "operationId": "getUserByName",
        "produces": [
          "application/json",
          "application/xml"
        ],
        "parameters": [
          {
            "name": "username",
            "in": "path",
            "description": "The name that needs to be fetched. Use user1 for testing. ",
            "required": true,
            "type": "string"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "$ref": "#/definitions/User"
            }
          },
          "400": {
            "description": "Invalid username supplied"
          },
          "404": {
            "description": "User not found"
          }
        }
      },
    }

  },

  "definitions": {

    "User": {
      "type": "object",
      "properties": {
        "id": {
          "type": "integer",
          "format": "int64"
        },
        "username": {
          "type": "string"
        },
        "email": {
          "type": "string"
        }
      }
    }
  }
}

Maybe this isn't the right direction for what you are trying to accomplish, but I just wanted to point it out incase it is useful.

atuttle commented 2 years ago

@pfreitag thanks for sharing. That might be useful as an output from what I'm thinking of, but what you provided there doesn't enforce the consistency in our code. We'd still have to carefully check to make sure we're outputting responses that match the contract.

What I have in mind is (hopefully!) something terse that not only would generate that sort of useful output, but also would take care of getting from "I've selected some columns from a table" to "Here's a JSON response that's guaranteed to match the contract".

atuttle commented 2 years ago

I've created a new branch named typed-responses for working on this. In it, I've made some (👷🏻 very rough! 🚧 ) changes and additions to illustrate what I have in mind, and added an example showcasing what the changes are capable of right now.

I'll highlight a few things here:

A type is created in a new cfc that extends taffy.core.type. It defines variables.columns (not in love with that name!) which is a simple DSL for describing the columns that the type is aware of.

https://github.com/atuttle/Taffy/blob/6a39402bf909f0df4fa221a792b145049282cee4/examples/api_TypedResponses/types/Customer.cfc#L1-L9

The collection of types is injected into every resource as types, so that returning a typed dataset could work like this. Note that it's not passing the string "types.Customer", it's the actual Customer type instance. It felt important to me to do it this way so that the developer will know as soon as possible if they've got the wrong name because they'll get a clear error message that the type name is wrong.

https://github.com/atuttle/Taffy/blob/6a39402bf909f0df4fa221a792b145049282cee4/examples/api_TypedResponses/resources/customerCollection.cfc#L15

Lastly, compare the type definition from (1) above to the example collection GET body below and note that the column name case doesn't always match the type column name case. Also there's no implementation instructions required to map db columns to type columns. If they have the same name (case-insensitive) then they'll map automatically. Oh, and a few other odds & ends like the date mask is applied automatically to the date data.

https://github.com/atuttle/Taffy/blob/6a39402bf909f0df4fa221a792b145049282cee4/examples/api_TypedResponses/resources/customerCollection.cfc#L4-L15

There's still a lot to do before this would be ready, but hopefully this illustrates one of the goals. Another goal that might be within sight now is the automatic generation of something that describes the responses, whether that's swagger json or json-schema, or something custom.

atuttle commented 2 years ago

I've started to dig into JSON Schema and I think there's a possibility we should support it as a way to define types. That said, is there a good json schema validator or something like that we can make use of? I'm even just a touch leery of using a Java library, because I don't want to put that dependency on our users.

Trying to be as objective as possible, on one hand:

JSON Schema is a standard and a reasonable one that could also save us the trouble of generating something like its output if we are getting the developers to provide it. (E.g. the generated data dictionary is the collection of json schemas from each type)

But on the other hand, JSON Schema is an evolving spec and without a library to abstract that away, I'm afraid we'll get sucked into maintaining tooling around different drafts of JSON Schema.

From what I see so far, there's no obvious answer. Please tell me there's a library out there to help with this?

pfreitag commented 2 years ago

@atuttle that is certainly a downside to json schema, there is no builtin support for it in CFML so you'd have to use a Java Library, or roll your own implementation in CFML.

atuttle commented 2 years ago

I suppose we can always list specific draft versions that we support, and if someone desperately needs support for a newer draft they can contribute it. :)

atuttle / Taffy

RFC: A tool that helps enforce consistency & rigor in responses #419

This is where you come in