MySQL Schema Generator - Githubissues

wing328 commented 6 years ago

Description

As discussed with @ybelenko we want to create a MySQL schema generator so as to create the MySQL database/tables more easily given an OpenAPI spec.

openapi-generator version

Latest master

Suggest a fix/enhancement

If anyone wants to help on this, please reply to let us know.

wing328 commented 6 years ago

We will need to store MySQL-specified attributes/properties (e.g. AUTO_INCREMENT) in the model defined in the OpenAPI spec.

One option is to use vendor extension (e.g. x-mysql-auto-increment).

macjohnny commented 6 years ago

if you do so, you could also consider to have a more general database generation option in different languages, e.g. JPA in java

wing328 commented 6 years ago

if you do so, you could also consider to have a more general database generation option in different languages, e.g. JPA in java

Is that similar to the databaseAdapter option in the ruby-on-rails generator?

CONFIG OPTIONS
    databaseAdapter
        The adapter for database (e.g. mysql, sqlite). Default: sqlite (Default: sqlite)

ybelenko commented 6 years ago

I've significant progress working on it. Prototype name for new codegen is MysqlSchemaServerCodegen. If anyone have better name - drop it in the comment. Also I would consider any suggestion of Openapi Data Type -> MySQL Data Type matching.

Openapi Data Type to MySQL Data Type

Openapi Data Type	Openapi Data Format	MySQL Data Type
`integer`	`int32`	`INT`
`integer`	`int64`	`BIGINT`
`number`	`float`	`FLOAT`
`number`	`double`	`DOUBLE`
`string`		`VARCHAR(255)`
`string`	`byte`	:question:
`string`	`binary`	:question:
`boolean`		`BOOLEAN` or `TINYINT(1)`
`string`	`date`	`DATE`
`string`	`date-time`	`DATETIME`
`string`	`password`	`VARCHAR(255)`
`file`		:question:

wing328 commented 6 years ago

Here are my suggestions:

Openapi Data Type	Openapi Data Format	MySQL Data Type
`string`	`byte`	`VARCHAR(255)`
`string`	`binary`	`MEDIUMBLOB`
`file`		`MEDIUMBLOB`

Ref: https://stackoverflow.com/questions/5959043/how-to-insert-a-file-in-mysql-database

(file in OAS2 has been deprecated and replaced by type: string, format: binary in OAS3)

ybelenko commented 6 years ago

@wing328 What about float and double? Do you agree with FLOAT and DOUBLE accordingly?

jimschubert commented 6 years ago

I agree with William's addition.

My comments are below.

String (no format)

OpenAPI string types with no data format specified should not have a maximum specified for varchar, unless there is an OpenAPI maxLength constraint. See maxLength in JSON Schema validation for strings, defined in Schema Object (spec 2.0 and 3.0.x).

varchar is generally used in MySQL for smaller bits of text. There's another data type, TEXT, which is used for larger text fields.

Consider, for example, if the specification defines an API for a blogging system:

---
  swagger: "2.0"
  info: 
    version: "1.0.0"
    title: "Bloggy"
    description: "A blog API sample"
    contact: 
      name: "Jim Schubert"
    license: 
      name: "MIT"
  host: "blog.example.com"
  basePath: "/api"
  schemes: 
    - "https"
  consumes: 
    - "application/json"
  produces: 
    - "application/json"
  paths: 
    /posts: 
      get: 
        description: "Returns all posts (not realistic for large blogs)"
        produces: 
          - "application/json"
        responses: 
          "200":
            description: "A list of blog posts."
            schema: 
              type: "array"
              items: 
                $ref: "#/definitions/Post"
  definitions: 
    Post: 
      type: "object"
      required: 
        - "id"
        - "title"
        - "content"
      properties: 
        id: 
          type: "integer"
          format: "int64"
        title: 
          type: "string"
          maxLength: 255
        content: 
          type: "string"
        related:
          type: array
          items:
            type: "integer"
            format: "int64"

edit: I didn't finish this thought. Sorry. In the above spec, title has a max length, and so varchar(255) would be appropriate. content on the other hand, is an unbounded string type, and something like a 1,000-word blog post wouldn't fit into a varchar(255) if this is used as the default for all strings.

Float/Double

I'd be hesitant to use FLOAT and DOUBLE in MySQL. These are inaccurate data types (approximations). Float uses 4 bytes (if precision <= 24) or 8 bytes (if precision is between 24 and 53) and Double uses 8 bytes of storage.

From Wikipedia:

IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2⁻²³) × 2¹²⁷ ≈ 3.402823 × 10³⁸

From MySQL docs:

Because floating-point values are approximate and not stored as exact values, attempts to treat them as exact in comparisons may lead to problems.

Examples of errors with Float/Double can be seen on MySQL docs B.5.4.8 Problems with Floating-Point Values.

A huge concern with this is that you can use mysqldump to replicate between a master and slave, and you can end up with different values between the two.

Conversely, DECIMAL (and NUMERIC) calculates the storage space based on the number stored. The left and right parts of the number use up to 4 bytes for every 9 digits. So, the following storage requirements would occur:

MySQL Type	Left	Right	Total Storage
`DECIMAL(0,9)`	4 bytes	4 bytes	8 bytes
`DECIMAL(9,9)`	4 bytes	4 bytes	8 bytes
`DECIMAL(12,2)`	6 bytes	4 bytes	10 bytes

The above could be off by a little, but you verify against 11.8 Data Type Storage Requirements. Disclaimer: I have stopped using MySQL because of inconsistencies and confusing documentation.

This is probably controversial, but my recommendation would be to use DECIMAL or NUMERIC with precision and scale set to reasonable values, potentially with generator options to use float and to use double as opt-ins. Users should also still be able to use type mappings to update float/double defaults. I make this recommendation to protect users from one of MySQL's many "gotchas" with replication and querying (I've been bitten too many times by MySQL), with the assumption that users requiring imprecise larger values would be able to override our generation defaults or modify existing columns to meet their needs.

ybelenko commented 6 years ago

@jimschubert Thanks, for descriptive answer. It will save my time a lot.

Maybe you right that TEXT is better when maxLength isn't specified. What you think about array and object types? I ignore these properties right now, because to me array equals additional table one to many relationship and object is also means foreign key to record from another table.

Sure, I want to add cli options in future versions. Right now I want to provide prestable version without any config.

Offtopic, @jimschubert what relational database you use instead of MySQL now? I've never had troubles with floats and doubles, however I haven't used decimal columns too much(for geo coordinates and money storage only). I've encountered some issues with foreign keys, varchar storage limit per table, some time it's frustrating that InnoDB doesn't support full text search and you need to create additional MyISAM table to store big articles text.

ybelenko commented 6 years ago

Any suggestion of how to calculate total number of digits and number of digits after the decimal point for decimal types? MySQL sets DECIMAL(10,0) by default and 0 digits after decimal point looks weird in our case.

wing328 commented 6 years ago

@ybelenko I don't have a clear answer. I did a keyword search ("mysql decimal") in Github and found a few occurrences of the following:

DECIMAL(20, 8)

I suggest we use it as a starting point and later provide instructions to the users on how to customize it to meet their requirement.

jimschubert commented 6 years ago

@ybelenko I use postgresql now. No relational database system is perfect, but Postgres has way fewer "gotchas" than MySQL (at least in my use cases). MySQL, up until 5.8/8.0 or whatever the recent release was, didn't support common SQL functionality that I use such as CTEs and windowing functions.

It also has very unintuitive transactional behavior. As an example if you mix InnoDB and MyISAM in a single query wrapped in a transaction, you can end up with data inserted into the MyISAM table that doesn't get rolled back because MyISAM doesn't support transactions.

Also, if you're like most engineers/companies, you're probably stuck on and older version of MySQL. They had a bug open for 14 years which was pretty critical, but which was obviously not a high priority for them. You could insert data into a table with an auto-incrementing primary key, delete the record, then restart your database and the sequence would restart to the maximum existing record value. This wasn't only an issue with data corruption, replication, and the sorts-- it was a potential security issue for applications running on MySQL. Their "fix" was to document it and write it off as documented behavior.

Then, there's the issue that utf8 encoding doesn't mean utf8. People had a field day on Reddit for that bug.

There are numerous other issues I've encountered with MySQL over the years, but those are the main ones that would lead me to recommend anyone against using MySQL. People use to argue that it was dead-simple to setup in comparison to Postgres and had better performance. I've found postgres to be much simpler to setup over the last couple years (magnitudes simpler than it was 17 or 18 years ago). But I would question the "speed" advantage of MySQL pretty heavily, considering it's taking some shortcuts to eek out this speed. In my own experience, we recently migrated from MySQL 5.7 to Postgresql 9.6 and our database queries dropped from 200-300ms to 50-100ms (both in AWS with the same data and same queries).

jimschubert commented 6 years ago

Also, I agree with @wing328 that DECIMAL(20, 8) would be a reasonable default for decimal, float, and double. This gives 12 integer digits to the left of the period and 8 to the right. So the maximum value here would be:

999999999999.99999999

Or:

999,999,999,999.99999999

Realistically, there are probably very few applications working with values representing 1 trillion and higher.

Some people might disagree with that as a default, so providing steps in the generated README for how to modify these and regenerate should address those concerns.

ybelenko commented 6 years ago

Well, default value is important, but I mean is there any rational way to calculate that value based on example, minimum, maximum, default properties? Should we try to analyze that properties to give most matching value?

wing328 commented 6 years ago

We can use additional information provided in the spec to try to come up with a better default but I would suggest looking into those as a day-2 requirement if users complain about the default value that we chose.

ybelenko commented 6 years ago

@wing328 Of course I'm not going to implement it right now. This thread has been created to think about, how we can generate perfect MySQL schema. I want to get max benefit from optional properties like minLength, maximum, default, example without any additional vendor extensions. So any suggestions to next versions of codegen are welcome. :bulb:

jimschubert commented 6 years ago

I would only set a default if default is provided. Max length (on strings) helps define the data type. Min/max would be more like constraints. For pattern constraints, we may need to resort to triggers.

For older versions of MySQL I don't think triggers are enabled by default- at least they weren't in 5.6 with AWS default configuration (see this SO post). If using triggers in the generator, we might want to explain how to enable them in AWS.

If defining triggers, also be aware that prior to MySQL 5.7.2, you couldn't have multiple triggers with the same triggering event and action (e.g. BEFORE UPDATE). We might want to limit the generator to a minimum of 5.7.2. The alternative for older versions is to create a single trigger and wrap multi-statement body in BEGIN... END.

wing328 commented 6 years ago

@ybelenko understood. I know you're trying to start the discussion earlier as you always think ahead 👍

ybelenko commented 6 years ago

General question to all. It's not clear what to do with complex model vars(array, object, \\Model\\User). Should we skip them in table creation for now? @wing328 is suggesting to set them as VARCHAR(255) as temporarely solution.

jimschubert commented 6 years ago

@ybelenko I would think defined models (\Model\User) could be a property referring to the table by id (user_id references user(id)). For array/object, I would choose something that's capable of storing these (like JSON, see docs).

ybelenko commented 6 years ago

I would think defined models (\Model\User) could be a property referring to the table by id (user_id references user(id)). For array/object, I would choose something that's capable of storing these (like JSON, see docs).

Good point about foreign key, I think the same. But it's obvious that I cannot implement it right now, because we know nothing about primary keys in both tables(current and referenced). Don't know what workaround to use for now... thinking about simple INT or VARCHAR(255) maybe.

And about JSON - it's great column type, but too modern. JSON requires MySQL 5.7.8. As fallback solution we should use LONGTEXT. Ref quote:

The space required to store a JSON document is roughly the same as for LONGBLOB or LONGTEXT;

It's obvious, that I should add cli option like useJsonDataType: true. What do you think guys, keep this option on or off by default? I think off, because it produces valid SQL syntax for outdated MySQL and if user need JSON data type, he can turn this option on.

jimschubert commented 6 years ago

My recommendation would be to not start by trying to support very old versions of the database. 5.7.8 was released 2015-08-03. It's doubtful that someone starting a new schema will be using that old of a database. If so, they can use type overrides to change JSON to LONGTEXT.

DivsDibs commented 5 years ago

@ybelenko Hi, Sorry I'm a little lost here. Do we have a MySQL Schema generator from Open API Specs now? If Yes, where to find it and how to use it please?

DivsDibs commented 5 years ago

Thanks I found it in https://openapi-generator.tech.

DivsDibs commented 5 years ago

@ybelenko is there a page that lists out the features which are yet to built? e.g. Identification of FOREIGN KEY in both 1-1 and 1-* scenarios?

ybelenko commented 5 years ago

@DivsDibs no, we don't have page you need, but you can read conversation about undescribed features in #2587.

FOREIGN KEYS is very difficult feature as far as I see right now(we don't even know how to specify primary keys in Openapi schema). I agree, that current MySQL generator is far from perfect and MySQL keys and indexes is major feature request, but I'm going to start we small enhances, like snake_case table names etc.

OpenAPITools / openapi-generator

MySQL Schema Generator #955

Description

openapi-generator version

Suggest a fix/enhancement

Openapi Data Type to MySQL Data Type

String (no format)

Float/Double