doctrine / dbal

Doctrine Database Abstraction Layer
https://www.doctrine-project.org/projects/dbal.html
MIT License
9.47k stars 1.34k forks source link

Get rid of hard-coded default values of maximum field lengths #3263

Closed morozov closed 5 years ago

morozov commented 6 years ago
Q A
New Feature no
RFC yes
BC Break yes

Default max column length on supported platforms

Below is the summary of the default maximum lengths of different column types on supported platforms. A dash means that the platform doesn't support column declaration without an explicitly specified maximum length.

Platform CHAR VARCHAR BINARY VARBINARY
MySQL 1 1
SQL Server 1 1 1 1
Oracle 1
IBM DB2 1 1
PostgreSQL 1 n/a (same as TEXT) n/a (BYTEA) n/a (BYTEA)
SQLite n/a n/a n/a (BLOB) n/a (BLOB)

Downsides of the existing API

  1. The default length of CHAR and VARCHAR is the same for all platforms — 255 (except for SQLAnywhere), however in reality they are never the same and never default to 255.

  2. A single method AbstractPlatform::getVarcharTypeDeclarationSQLSnippet() is responsible for creating both CHAR and VARCHAR columns, however, on most platforms they are represented by different types and have different requirements for the default length within the same platform. It causes most platforms to have code like if ($fixed).

  3. Unlike VARCHAR, a default CHAR length provided by DBAL doesn't make sense because it's fixed and should be defined by the domain.

  4. A lot of hard-coded values like 254, 255 in the code base which in some cases don't reflect the reality (see #3133):

    $ git grep 255 -- lib/ | wc -l
    24   

Proposal

The idea is to have the API which on the one hand doesn't provide any hidden defaults, on the other, doesn't impose any artificial limitations. For those platforms, which require length, the length should be required in the column definition. For those which don't require or don't support it, the length is optional.

  1. Remove get(Var)?(Char|Binary)(Default|Max)Length() methods since they may be inaccurate and not reflect the real limitations of a given platform, its version and other platform-specific details (e.g. MySQL storage engine).
  2. Split AbstractPlatform::getVarcharTypeDeclarationSQLSnippet($length, $fixed) into:
    1. AbstractPlatform::getVarcharTypeDeclarationSQLSnippet(?int $length) (nullable length for VARCHAR ). The NULL value will be handled by the concrete platform. It will either create a field with non-limited length (PostgreSQL, SQLite) or throw an exception (the rest).
    2. AbstractPlatform::getCharTypeDeclarationSQLSnippet(?int $length).
  3. Split AbstractPlatform::getBinaryTypeDeclarationSQLSnippet($length, $fixed) in the same way as above.
  4. Get rid of hard-coded lengths in the rest of the code (e.g. AbstractPlatform::getCreateTableSQL()).

Benefits

  1. Platform API becomes cleaner and closer to the domain.
  2. Platform implementations become simpler and less specific (e.g. the CHAR and VARCHAR column declaration is the same almost on all platforms).
  3. Developers won't mistakenly omit the maximum length if they test their code on the platforms which enforce it.

Downsides

  1. Developers will be required to specify the max length of the column manually.
  2. The needed for the length on a specific column will be detected only at runtime when performing DDL against a platform which requires the length.

References

  1. Original discussion: https://github.com/doctrine/dbal/pull/3255#discussion_r211107019.
morozov commented 6 years ago

There's also a reason for developers to not rely on the default field length (especially, 255). Amongst supported platforms, at least MySQL has a row size limitation of 64KB. Given that a Unicode character can take up to 4 bytes, you can have at most ~64 VARCHAR(255) fields in a table using utf8mb4.

If you use utf8 which reserves 3 bytes per characters and doesn't yet exceed the row length, at some moment you may decide to do ALTER TABLE [...] CONVERT TO CHARACTER SET utf8mb4 and exceed the row length. You'll have to shrink your columns before you can proceed.

sidz commented 6 years ago

I've started to investigate and implement this approach and faced with the issue that StringType won't be possible to determine which field type it should use (char, varchar, text) without maxLength method anymore.

https://github.com/doctrine/dbal/blob/3979efeedfda1eda12d30781344583b0ece0a644/lib/Doctrine/DBAL/Platforms/AbstractPlatform.php#L296-L313

does it mean that it need to add CharType, VarcharType, TextType and so one?

or what did I miss?

morozov commented 6 years ago

Hold on. I already did some work on it and it turns out being a huge one (look at my issues/3263 branch).

StringType won't be possible to determine which field type it should use (char, varchar, text) without maxLength method anymore

The underlying column type will be determined from the field definition. If it's "fixed", it's CHAR, otherwise it's VARCHAR. StringType will not fall back to using TEXT anymore same as Binary won't fall back to BLOB (see https://github.com/doctrine/dbal/pull/3188).

Please look at the API changes. I can document them if something's not clear but it's not yet finished.

sidz commented 6 years ago

then no help is required here

morozov commented 5 years ago

Fixed by #3586.

Tobion commented 2 years ago

Cool this is already resolved in 4.x. The default length of varchar columns in mysql and postgres to be 255 was not reflecting current best practices anymore. For example. with utf8 character columns in mysql which is the standard nowadays, there is no point to use a length of 255 by default. With latin1 charset there was at least the argument that you save a byte of storage size when limiting to 255.

So I think this is the right thing to do. But one thing might be unfortunate. When inferring the type from property types (feature https://github.com/doctrine/orm/issues/7939), string type is mapped to varchar. So when people using a string property but do not specify a length, this will work in Postgress (varchar without length equals text type) but will fail in MySQL. Given that this is a common case and people usually don't really care about the length of a string unless there are some specific requirements, I think this could be improved.

So I would suggest to change the inferred type for a string property to TEXT type when there a no length. If there is a length and it is supported by VARCHAR, it could stay to be varchar. This way inferred type is more clever depending on the length and it works without a failure by default.

Tobion commented 2 years ago

I've opened https://github.com/doctrine/orm/issues/9678 for the above proposal.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.