Closed wing328 closed 6 years ago
We will need to store MySQL-specified attributes/properties (e.g. AUTO_INCREMENT) in the model defined in the OpenAPI spec.
One option is to use vendor extension (e.g. x-mysql-auto-increment).
if you do so, you could also consider to have a more general database generation option in different languages, e.g. JPA
in java
if you do so, you could also consider to have a more general database generation option in different languages, e.g. JPA in java
Is that similar to the databaseAdapter
option in the ruby-on-rails
generator?
CONFIG OPTIONS
databaseAdapter
The adapter for database (e.g. mysql, sqlite). Default: sqlite (Default: sqlite)
I've significant progress working on it. Prototype name for new codegen is MysqlSchemaServerCodegen
. If anyone have better name - drop it in the comment.
Also I would consider any suggestion of Openapi Data Type -> MySQL Data Type matching.
Openapi Data Type | Openapi Data Format | MySQL Data Type |
---|---|---|
integer |
int32 |
INT |
integer |
int64 |
BIGINT |
number |
float |
FLOAT |
number |
double |
DOUBLE |
string |
VARCHAR(255) |
|
string |
byte |
:question: |
string |
binary |
:question: |
boolean |
BOOLEAN or TINYINT(1) |
|
string |
date |
DATE |
string |
date-time |
DATETIME |
string |
password |
VARCHAR(255) |
file |
:question: |
Here are my suggestions:
Openapi Data Type | Openapi Data Format | MySQL Data Type |
---|---|---|
string |
byte |
VARCHAR(255) |
string |
binary |
MEDIUMBLOB |
file |
MEDIUMBLOB |
Ref: https://stackoverflow.com/questions/5959043/how-to-insert-a-file-in-mysql-database
(file
in OAS2 has been deprecated and replaced by type: string, format: binary
in OAS3)
@wing328 What about float
and double
? Do you agree with FLOAT
and DOUBLE
accordingly?
I agree with William's addition.
My comments are below.
OpenAPI string types with no data format specified should not have a maximum specified for varchar, unless there is an OpenAPI maxLength
constraint. See maxLength in JSON Schema validation for strings, defined in Schema Object (spec 2.0 and 3.0.x).
varchar
is generally used in MySQL for smaller bits of text. There's another data type, TEXT
, which is used for larger text fields.
Consider, for example, if the specification defines an API for a blogging system:
---
swagger: "2.0"
info:
version: "1.0.0"
title: "Bloggy"
description: "A blog API sample"
contact:
name: "Jim Schubert"
license:
name: "MIT"
host: "blog.example.com"
basePath: "/api"
schemes:
- "https"
consumes:
- "application/json"
produces:
- "application/json"
paths:
/posts:
get:
description: "Returns all posts (not realistic for large blogs)"
produces:
- "application/json"
responses:
"200":
description: "A list of blog posts."
schema:
type: "array"
items:
$ref: "#/definitions/Post"
definitions:
Post:
type: "object"
required:
- "id"
- "title"
- "content"
properties:
id:
type: "integer"
format: "int64"
title:
type: "string"
maxLength: 255
content:
type: "string"
related:
type: array
items:
type: "integer"
format: "int64"
edit: I didn't finish this thought. Sorry.
In the above spec, title
has a max length, and so varchar(255)
would be appropriate. content
on the other hand, is an unbounded string
type, and something like a 1,000-word blog post wouldn't fit into a varchar(255)
if this is used as the default for all strings.
I'd be hesitant to use FLOAT
and DOUBLE
in MySQL. These are inaccurate data types (approximations). Float uses 4 bytes (if precision <= 24) or 8 bytes (if precision is between 24 and 53) and Double uses 8 bytes of storage.
From Wikipedia:
IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2−23) × 2127 ≈ 3.402823 × 1038
From MySQL docs:
Because floating-point values are approximate and not stored as exact values, attempts to treat them as exact in comparisons may lead to problems.
Examples of errors with Float/Double can be seen on MySQL docs B.5.4.8 Problems with Floating-Point Values.
A huge concern with this is that you can use mysqldump to replicate between a master and slave, and you can end up with different values between the two.
Conversely, DECIMAL
(and NUMERIC
) calculates the storage space based on the number stored. The left and right parts of the number use up to 4 bytes for every 9 digits. So, the following storage requirements would occur:
MySQL Type | Left | Right | Total Storage |
---|---|---|---|
DECIMAL(0,9) |
4 bytes | 4 bytes | 8 bytes |
DECIMAL(9,9) |
4 bytes | 4 bytes | 8 bytes |
DECIMAL(12,2) |
6 bytes | 4 bytes | 10 bytes |
The above could be off by a little, but you verify against 11.8 Data Type Storage Requirements. Disclaimer: I have stopped using MySQL because of inconsistencies and confusing documentation.
This is probably controversial, but my recommendation would be to use DECIMAL
or NUMERIC
with precision and scale set to reasonable values, potentially with generator options to use float and to use double as opt-ins. Users should also still be able to use type mappings to update float/double defaults. I make this recommendation to protect users from one of MySQL's many "gotchas" with replication and querying (I've been bitten too many times by MySQL), with the assumption that users requiring imprecise larger values would be able to override our generation defaults or modify existing columns to meet their needs.
@jimschubert Thanks, for descriptive answer. It will save my time a lot.
Maybe you right that TEXT
is better when maxLength
isn't specified. What you think about array
and object
types? I ignore these properties right now, because to me array
equals additional table one to many
relationship and object
is also means foreign key to record from another table.
Sure, I want to add cli options in future versions. Right now I want to provide prestable version without any config.
Offtopic, @jimschubert what relational database you use instead of MySQL now? I've never had troubles with floats and doubles, however I haven't used decimal columns too much(for geo coordinates and money storage only). I've encountered some issues with foreign keys, varchar storage limit per table, some time it's frustrating that InnoDB doesn't support full text search and you need to create additional MyISAM table to store big articles text.
Any suggestion of how to calculate total number of digits
and number of digits after the decimal point
for decimal types? MySQL sets DECIMAL(10,0)
by default and 0
digits after decimal point looks weird in our case.
@ybelenko I don't have a clear answer. I did a keyword search ("mysql decimal") in Github and found a few occurrences of the following:
DECIMAL(20, 8)
I suggest we use it as a starting point and later provide instructions to the users on how to customize it to meet their requirement.
@ybelenko I use postgresql now. No relational database system is perfect, but Postgres has way fewer "gotchas" than MySQL (at least in my use cases). MySQL, up until 5.8/8.0 or whatever the recent release was, didn't support common SQL functionality that I use such as CTEs and windowing functions.
It also has very unintuitive transactional behavior. As an example if you mix InnoDB and MyISAM in a single query wrapped in a transaction, you can end up with data inserted into the MyISAM table that doesn't get rolled back because MyISAM doesn't support transactions.
Also, if you're like most engineers/companies, you're probably stuck on and older version of MySQL. They had a bug open for 14 years which was pretty critical, but which was obviously not a high priority for them. You could insert data into a table with an auto-incrementing primary key, delete the record, then restart your database and the sequence would restart to the maximum existing record value. This wasn't only an issue with data corruption, replication, and the sorts-- it was a potential security issue for applications running on MySQL. Their "fix" was to document it and write it off as documented behavior.
Then, there's the issue that utf8 encoding doesn't mean utf8. People had a field day on Reddit for that bug.
There are numerous other issues I've encountered with MySQL over the years, but those are the main ones that would lead me to recommend anyone against using MySQL. People use to argue that it was dead-simple to setup in comparison to Postgres and had better performance. I've found postgres to be much simpler to setup over the last couple years (magnitudes simpler than it was 17 or 18 years ago). But I would question the "speed" advantage of MySQL pretty heavily, considering it's taking some shortcuts to eek out this speed. In my own experience, we recently migrated from MySQL 5.7 to Postgresql 9.6 and our database queries dropped from 200-300ms to 50-100ms (both in AWS with the same data and same queries).
Also, I agree with @wing328 that DECIMAL(20, 8)
would be a reasonable default for decimal, float, and double. This gives 12 integer digits to the left of the period and 8 to the right. So the maximum value here would be:
999999999999.99999999
Or:
999,999,999,999.99999999
Realistically, there are probably very few applications working with values representing 1 trillion and higher.
Some people might disagree with that as a default, so providing steps in the generated README for how to modify these and regenerate should address those concerns.
Well, default value is important, but I mean is there any rational way to calculate that value based on example
, minimum
, maximum
, default
properties? Should we try to analyze that properties to give most matching value?
We can use additional information provided in the spec to try to come up with a better default but I would suggest looking into those as a day-2 requirement if users complain about the default value that we chose.
@wing328 Of course I'm not going to implement it right now. This thread has been created to think about, how we can generate perfect MySQL schema. I want to get max benefit from optional properties like minLength
, maximum
, default
, example
without any additional vendor extensions. So any suggestions to next versions of codegen are welcome. :bulb:
I would only set a default if default
is provided. Max length (on strings) helps define the data type. Min/max would be more like constraints. For pattern constraints, we may need to resort to triggers.
For older versions of MySQL I don't think triggers are enabled by default- at least they weren't in 5.6 with AWS default configuration (see this SO post). If using triggers in the generator, we might want to explain how to enable them in AWS.
If defining triggers, also be aware that prior to MySQL 5.7.2, you couldn't have multiple triggers with the same triggering event and action (e.g. BEFORE UPDATE). We might want to limit the generator to a minimum of 5.7.2. The alternative for older versions is to create a single trigger and wrap multi-statement body in BEGIN... END
.
@ybelenko understood. I know you're trying to start the discussion earlier as you always think ahead 👍
General question to all. It's not clear what to do with complex model vars(array
, object
, \\Model\\User
). Should we skip them in table creation for now? @wing328 is suggesting to set them as VARCHAR(255)
as temporarely solution.
@ybelenko I would think defined models (\Model\User
) could be a property referring to the table by id (user_id references user(id)
). For array/object, I would choose something that's capable of storing these (like JSON, see docs).
I would think defined models (\Model\User) could be a property referring to the table by id (user_id references user(id)). For array/object, I would choose something that's capable of storing these (like JSON, see docs).
Good point about foreign key, I think the same. But it's obvious that I cannot implement it right now, because we know nothing about primary keys in both tables(current and referenced). Don't know what workaround to use for now... thinking about simple INT
or VARCHAR(255)
maybe.
And about JSON
- it's great column type, but too modern. JSON
requires MySQL 5.7.8. As fallback solution we should use LONGTEXT
. Ref quote:
The space required to store a JSON document is roughly the same as for LONGBLOB or LONGTEXT;
It's obvious, that I should add cli option like useJsonDataType: true
. What do you think guys, keep this option on or off by default? I think off, because it produces valid SQL syntax for outdated MySQL and if user need JSON
data type, he can turn this option on.
My recommendation would be to not start by trying to support very old versions of the database. 5.7.8 was released 2015-08-03. It's doubtful that someone starting a new schema will be using that old of a database. If so, they can use type overrides to change JSON
to LONGTEXT
.
@ybelenko Hi, Sorry I'm a little lost here. Do we have a MySQL Schema generator from Open API Specs now? If Yes, where to find it and how to use it please?
Thanks I found it in https://openapi-generator.tech.
@ybelenko is there a page that lists out the features which are yet to built? e.g. Identification of FOREIGN KEY
in both 1-1
and 1-*
scenarios?
@DivsDibs no, we don't have page you need, but you can read conversation about undescribed features in #2587.
FOREIGN KEYS
is very difficult feature as far as I see right now(we don't even know how to specify primary keys in Openapi schema). I agree, that current MySQL generator is far from perfect and MySQL keys and indexes is major feature request, but I'm going to start we small enhances, like snake_case
table names etc.
Description
As discussed with @ybelenko we want to create a MySQL schema generator so as to create the MySQL database/tables more easily given an OpenAPI spec.
openapi-generator version
Latest master
Suggest a fix/enhancement
If anyone wants to help on this, please reply to let us know.