asyncapi / shape-up-process

This repo contains pitches and the current cycle bets. More info about the Shape Up process: https://basecamp.com/shapeup
https://shapeup.asyncapi.io
11 stars 8 forks source link

Data models generation #21

Closed fmvilas closed 3 years ago

fmvilas commented 4 years ago

On This Page

Summary

Problem Overview

  1. Generating strong-typed data models is a rather complex task.
  2. Complex and unmaintainable template files.
  3. Creating and maintaining data model generators for many languages is an arduous task.

Solution Overview

  1. Design an SDK to generate strong-typed data models from AsyncAPI / JSON Schema.
  2. The SDK must be suitable for Nunjucks as well as for React.
  3. The SDK must allow for the extensibility of existing data model definitions.

Positive Side-Effects

  1. Less code fragmentation and duplication across templates.

Problem Details

Generating strong-typed data models is a rather complex task

As of today, generating data models is a purely manual task that every template developer should take care of. There are lots of things to take into account when creating a generic solution even when it's for a specific programming language. To name a few:

Complex and unmaintainable template files

The complexity described above also creates some bad side-effects, like the difficulty to read and maintain a data model template. Using React may improve the readability but that will leave out the developers who prefer Nunjucks.

Creating and maintaining data model generators for many languages is an arduous task

If creating and maintaining a data model generator for a single programming language is hard, imagine doing the same for many programming languages. And keeping them updated and in-sync in terms of features is another challenge. E.g., making sure all of them support annotations, different visibility levels, etc.

Solution guidelines

Since we want to solve these problems for Nunjucks and React templates, the solution goes through an SDK that both frameworks would use. It may be used behind the scenes by providing filters in Nunjucks and components in React. In any case, there must be a framework-agnostic solution.

For now, the solution is to invest some time in thinking and designing that SDK. Let me outline a few things I found out during my research:

My proposed "solution"

Let's start with an example. Assuming we have a schema like the following:

{
  type: 'object',
  properties: {
    displayName: {
      type: 'string'
    },
    email: {
      type: 'string',
      format: 'email'
    },
    createdAt: {
      type: 'string',
      format: 'date-time'
    }
  }
}

...and this code using the SDK:

const { generateModelFor, Constants, buildMethod, buildGetters, buildSetters, typeNameFor, varNameFor, annotate } = require('@asyncapi/generator-sdk')
const schema = new Schema(UserSignedUp)

return generateModelFor('java', schema, {
  package: 'com.asyncapi',
  dependencies: ['com.fasterxml.jackson.annotation.JsonFormat'],
  annotations: {
    properties: (property, propertyName) => {
      if (property.format() !== 'date-time') return property

      annotate(property, 'JsonFormat', [{
        shape: 'JsonFormat.Shape.STRING',
        pattern: '"dd-MM-yyyy hh:mm:ss"',
      }])
    },
    methods: (method) => {
      const { name, visibility, isStatic, returnType, arguments } = method
      annotate(method, 'SomeAnnotation', [{
        someAnnotationKey: '"someAnnotationValue"',
      }])
    }
  },
  methods: {
    hashCode: null, // Removes hashCode method from the generated result
    findEvenOdd: buildMethod({
      name: 'findEvenOdd',
      visibility: Constants.Visibility.PUBLIC,
      static: true,
      returnType: 'String',
      parameters: [{
        name: 'num',
        type: 'int',
        // default: 0,
      }],
      body: '// method implementation here...'
    }),
    // getters
    ...buildGetters(schema),
    // setters
    ...buildSetters(schema),
  },
})

The following should be generated:

package com.asyncapi;

import com.fasterxml.jackson.annotation.JsonFormat;
import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonProperty;
import java.util.Date;
import java.text.DateFormat;

@JsonInclude(JsonInclude.Include.NON_NULL)
public class UserSignedUp {

  @JsonProperty("displayName")
  private String displayName;

  @JsonProperty("email")
  private String email;

  @JsonProperty("createdAt")
  @JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "dd-MM-yyyy hh:mm:ss")
  private Date createdAt;

  constructor (String displayName, String email, Date createdAt) {
    this.displayName = displayName;
    this.email = email;
    this.createdAt = createdAt;
  }

  @SomeAnnotation(SomeAnnotationKey = "SomeAnnotationValue")
  public String getDisplayName () {
    return displayName;
  }

  @SomeAnnotation(SomeAnnotationKey = "SomeAnnotationValue")
  public String getEmail () {
    return email;
  }

  @SomeAnnotation(SomeAnnotationKey = "SomeAnnotationValue")
  public Date getCreatedAt () {
    return createdAt;
  }

  @SomeAnnotation(SomeAnnotationKey = "SomeAnnotationValue")
  public void setDisplayName (String newDisplayName) {
    displayName = newDisplayName;
  }

  @SomeAnnotation(SomeAnnotationKey = "SomeAnnotationValue")
  public void setEmail (String newEmail) {
    email = newEmail;
  }

  @SomeAnnotation(SomeAnnotationKey = "SomeAnnotationValue")
  public void setCreatedAt (String newCreatedAt) {
    return DateFormat.parse(newCreatedAt);
  }

  @SomeAnnotation(SomeAnnotationKey = "SomeAnnotationValue")
  public static void findEvenOdd (int num) {
    // method implementation here...  
  }
}

Afterward, this SDK can power Nunjucks filters and React components. See the following React example:

import { Model, Constants } from 'generator-react-sdk'
const schema = new Schema(UserSignedUp)

return (
  <Model
    language="java"
    schema={schema}
    annotateProperties={prop => {
      if (prop.format() !== 'date-time') return prop

      return annotate(prop, 'JsonFormat', [{
        shape: 'JsonFormat.Shape.STRING',
        pattern: 'dd-MM-yyyy hh:mm:ss',
      }])
    }}
    annotateMethods={method => {
      const { name, visibility, isStatic, returnType, arguments } = method
      annotate(method, 'SomeAnnotation', [{
        someAnnotationKey: '"someAnnotationValue"',
      }])
    }}
    methods={{
      hashCode: null, // Removes hashCode method from the generated result
      findEvenOdd: {
        visibility: Constants.Visibility.PUBLIC,
        static: true,
        returnType: 'String',
        parameters: [{
          name: 'num',
          type: 'int',
          // default: 0,
        }],
        body: '// method implementation here...'
      }
    }}
    buildGetters={true}
    buildSetters={true}
  />
)

These examples are suggestions

This pitch/bet is actually about ideating and designing the SDK (excluding the React one). Let's make sure it works with any kind of programming language (object-oriented, functional, etc.)

A good starting list of languages to test against are:

And a good starting list of artifacts that programming languages may offer:

Boundaries

Don’t do

Watch out for

Long-Term Vision

In the short term, the idea behind this SDK is to power the Generator templates, making it super easy to create new ones. However, long-term applications are also interesting. A declarative way of generating code will also help us with:

That said, focus on the short-term goals for now. We don't want to end up creating a super generic solution.

Happy ~coding~ designing! :blush:

jonaslagoni commented 3 years ago

Just to mention this as a possible solution, would it really make sense to reinvent the wheel? QuickType is already open source which does exactly this with the same license as us.

Quick intro to QuickType, it generates strongly-typed models for multiple languages and serializers from JSON file or url, JSON Schema, TypeScript or GraphQL queries. Already have an easy wrapper in a template filter which are used in quicktype template.

Choosing an existing open source solution slingshots us quite far in terms of starting point where only a wrapper would be sufficient.

Some downsides of quicktype I see (as of this writing)

Reading materiale for further understanding:

fmvilas commented 3 years ago

Yeah, Quicktype was my first choice when I started investigating it. I wanted us to have something more customizable. For instance, you can add custom annotations, custom methods, remove methods from the default output, etc. This pitch is about figuring out how we're going to do this so if you guys decide Quicktype is the way to go (even if it's behind the scenes), I'm all for it.

Main reasons I've been hesitant to mention Quicktype here:

  1. Only 2 maintainers. I offered help but got no response. Tried to reach out on Twitter private messages and email but got no response 🤷‍♂️ On issues, they're not replying or replying super late so I'm not sure it's a good idea to rely on a library like this. It hurts me to say it because the result is neat.
  2. We need more levels of customization (as noted above). It's true we can ship something faster with Quicktype but we're not in a rush, right? :) Whatever we decide, let's make sure is for the long term because once we have lots of templates using our SDK it will not be easy to change.

Let me remind this pitch is exactly about figuring out the best solution, not implementing it. What I wrote here are just suggestions based on my findings while investigating how to do it. If we select this pitch for the cycle, the expected output is a thoroughly documented solution and zero code.

fmvilas commented 3 years ago

Just recorded a video explaining the concept/idea I had in mind. It is not mutually exclusive with Quicktype: https://www.youtube.com/watch?v=lAx0ZzmWUVI

derberg commented 3 years ago

I also have mixed feelings with quicktype. I was fan of it, especially when I found out the generated output can be modified but yeah, recently to me quicktype situation = nunjucks, so at the end a tool that has its limits, you cannot overcome fast or at all. I hate reinventing the wheel too though.

Regarding research, I think when working on design, you need to probably take into account how indentation will be handled during generation, for languages like Python. I know that Michael had to have special filters for nunjucks to handle those properly, pretty cumbersome.

Also, after watchin the video, looks like you have to look on syntax of every possible language, or at least those that are already supported by quicktype. Good luck 🤞

magicmatatjahu commented 3 years ago

My few words:

return generateModelFor('java', schema, { ...{some object} });

and in this example you infer java name and then {some object} must includes only proper for Java properties. So for java you could use annotations but for other language like typescript you will get syntax error, than {some object} is not a valid type, when you use annotations field. I can describe it better if someone doesn't understand.

jonaslagoni commented 3 years ago

Think I need a more detailed explanation about what you mean by the first suggestion @magicmatatjahu :smile:

* we should check tools like `swagger-codegen`. As I know it is written in Java, but maybe we'll find something interesting in implementation, how these guys handled common problems like interface for other (custom) languages/frameworks.

Might be a good idea yea, we just need to keep in mind there are no official maintained codegen from swagger, but we should look at the OpenAPI-generator.

* rather than thinking about AsyncAPI schema, we should think only about handling JSONSchema. If sdk will work for pure JSONSchema, then it should also works for AsyncAPI.

After reading some more about what QuickType does under the hood, I had an idea or question. Would we really care what the input to the SDK is? :thinking: If we take any typed language wouldn't you be able to map types between each other "easily" i.e. you take any class from Java as input and feed it into the SDK which then are able to convert it to a type in any language of your choosing?

Whether the SDK reads a JSON Schema or a Java class it only matter of where you find the details for which the SDK needs to generate type files. Just like QuickType it takes inputs such as JSON Schema, TypeScript or GraphQL queries I think we should design the SDK to support this scenario.

magicmatatjahu commented 3 years ago

@jonaslagoni

Think I need a more detailed explanation about what you mean by the first suggestion @magicmatatjahu 😄

Let's image the situation, when user calls our function in this way:

return generateModelFor('java', schema, { decorators: [...] } });

then TS should throws error, because in java you haven't decorators, but it's called annotations. In oppposite, when you will use function like:

return generateModelFor('python', schema, { decorators: [...] } });

TS should passes your signature, because python has decorators (not annotations). It's called inferring types :) We can change the third argument signature by checking first (language name).

Might be a good idea yea, we just need to keep in mind there are no official maintained codegen from swagger, but we should look at the OpenAPI-generator.

As I see the swagger-codegen is the oldversion of openapi-codegen :) OpenAPI is a new name for swagger :)

After reading some more about what QuickType does under the hood, I had an idea or question. Would we really care what the input to the SDK is? 🤔 If we take any typed language wouldn't you be able to map types between each other "easily" i.e. you take any class from Java as input and feed it into the SDK which then are able to convert it to a type in any language of your choosing?

Whether the SDK reads a JSON Schema or a Java class it only matter of where you find the details for which the SDK needs to generate type files. Just like QuickType it takes inputs such as JSON Schema, TypeScript or GraphQL queries I think we should design the SDK to support this scenario.

Good point! But we should support only static data, I mean JSONSchema, GraphQL SDL. Handle, e.g. Java-Spring class with annotations and then converts it to Haskell (for example) types will be very hard.

magicmatatjahu commented 3 years ago

I have an idea to support React-JSX in data models generation and also in templating in way of some kind of AST (Abstract syntax tree). React-JSX is for creating tree for VirtualDOM, but it's very good to create static data like XML. Where am I going? For example, we have a class definition written in JSX like:

<Class name="SomeClass">
  <Property static private name="email" type="string" value="someEmail">
    <Annotation name="JsonProperty" value="email" />
  </Property>
  <Method public name="someMethod">
    <Argument type="string" name="displayName" />
    <Argument type="string" name="email" />
    <Body>
      return email == displayName;
    </Body>
  </Method>
</Class>

and then generator should return for java:

class SomeClass {
  @JsonProperty("email")
  static private String email;

  public someMethod(String displayName, String email) {
    return email == displayName;
  }
}

Why this approach is very good? Because then you can override Class Method Argument Property etc component to other language like python JS, C++. You can imagine that you wrote one definition for your template for Java, but developer can switch in generator cli from java to python and then Class Method Argument Property component change behaviour and render something like:

class SomeClass:
    email = "someEmail"

    def someMethod(self, displayName, email):
        return email == displayName

In python types and annotation is missed because Python don't support this. (I know that types and annotation->decorators are in python but this not this same as in other languages).

@jonaslagoni @fmvilas @derberg What do you think guys? Probably it won't be very generic and in some cases will breaks compability between languages, but simple JSON Schema types (string, arrays etc) for languages should be handled without huge boilerplate.

derberg commented 3 years ago

I'm not that deep in the topic so hard for me to share good opinion here. Tbh, I think this is such a complex topic that I would prefer to have components per language rather than one component for all, I'm just affraid of how complex will it be 😅

fmvilas commented 3 years ago

I think it's great but remember this is not just for React, it should also work for Nunjucks, that's why we have to go on a lower level. Eventually, what we achieve here, will be what back these components you defined. Feel free to open another issue in the Generator so we don't forget!

derberg commented 3 years ago

@fmvilas are you 100% sure we should take Nunjucks into consideration? shouldn't we consider that react and model generation is for 2.0 and that nunjucks will stay (or even not) in 2.0 just to give more time for migration

fmvilas commented 3 years ago

It's more of an exercise than wanting to keep Nunjucks around. If we make things framework-agnostic, it will be easy for us to get rid of React if that's needed. I don't think it will be but who knows 🤷‍♂️

jonaslagoni commented 3 years ago

Wups 😇

I think it's great but remember this is not just for React, it should also work for Nunjucks, that's why we have to go on a lower level. Eventually, what we achieve here, will be what back these components you defined.

I agree, the way I see it is that we are talking about 2 problems which needs to be figured out:

  1. We need a renderer behind the scenes, and IMO @magicmatatjahu suggestion with React makes perfect sense to check out and see how it works. This would also benefit all our React template developers with reuseable helper components to template code in various languages because we build the stepping stones.
  2. We need to figure out (based on the type renderer used behind the scenes) how this can be integrated into templates regardless of whether they use React or Nunjucks, or used in a library. This could be as you suggested with the "wrapper functions" with customization options.

Further I think it would be a good idea to state the functional requirements for both the wrapper and underlying type renderer to get an idea of the range of the customization that should be possible. Do we care about non-functional requirements? 🤔

What do you think? Am I missing something?

Edit

Just to add something we have not discussed, will we develop it as a standalone library (the generated code) or are we expecting people to handle things like project setup etc on their own 🤔?

jonaslagoni commented 3 years ago

Closing this bet after a finished research cycle which mounted in the new pitch #43.