Open morozov opened 3 years ago
Really nice RFC! Here are the few remarks I could come up with:
Hence, the corresponding method will not be implemented and isn't supposed to be used.
Wouldn't it be easier for static analysis to catch that if the method was really not implemented? Technically, it is.
That would require splitting Name
, having either
interface A
{
/**
* Returns object name representation as an SQL identifier.
*/
public function toIdentifier(NameBuilder $builder): string;
}
interface Name extends A
{
/**
* Returns object name representation as an SQL literal.
*/
public function toLiteral(NameBuilder $builder): string;
}
final class QualifiedName implements A
{
// …
}
or the more symmetrical
interface A
{
/**
* Returns object name representation as an SQL identifier.
*/
public function toIdentifier(NameBuilder $builder): string;
}
interface B
{
/**
* Returns object name representation as an SQL literal.
*/
public function toLiteral(NameBuilder $builder): string;
}
interface Name extends A,B
{
}
final class QualifiedName implements A
{
// …
}
The that is not fully upper-cased
A noun seems to be missing in this sentence.
I think I overall agree with what I've read here, but it would be nice to see the impact on the docs
directory to better understand what impact this will have on users. For instance, I'm unsure how much they are going to use the interfaces proposed above (and I'm hoping they won't have to do it unless required).
The that is not fully upper-cased
A noun seems to be missing in this sentence.
It should be "The name that is not fully upper-cased". Thanks!
Wouldn't it be easier for static analysis to catch that if the method was really not implemented?
It definitely would. I'll do more prototyping and will try to take this into account.
FWIW, the RFC is far from being complete. There are more things to take care of, e.g. comparing the names that are now objects for equality and using them as keys in collections of named objects (e.g. table columns or schema tables). I'll continue adding more details once I get them sorted out.
Do you see the ORM using this in the metadata drivers to convert each field to the target platforms identifier once during parsing for example? I'd like to keep this out of "regular" runtime and cache the results. It looks like that would be feasible.
So far, I've been thinking about this in the context of schema management. If the ORM uses this API for building SQL (which would make perfect sense), it can cache the names as long as the schema remains unchanged.
Motivation
There's a bunch of issues around handling object names that exist due to the existing design assumptions (e.g. https://github.com/doctrine/dbal/issues/4357). The assumptions are:
.
).`
).The above assumptions lead to the API where any name is represented as a string which then has to be parsed to identify its meaning. E.g.:
Proposal
This design proposal attempts to replace parsing the string with representing names as domain objects.
The name interface
The corner-stone interface describing object names will be
Name
:It will represent any name: qualified or unqualified; quoted or unquoted and will be used by all SQL objects like tables, views, columns, databases, etc.
Note, currently, there's no plan to expose the raw value of the name. This will make it easier to abuse it in the raw form which may be the source of all sorts of issues (e.g. SQL injections).
A name can be used in two forms in SQL, hence the methods:
The SQL Builder interface family
The string representations of the name will be built by an SQL builder that is specific to the given database platform. For the sake of simplicity, we can consider the platform and the SQL builder synonymous for now.
The need to have different methods for representing names as literals and identifiers should be clear from the above. The need to have different methods for quoted and unquoted names will be explained below.
Unqualified and qualified names
Before we get to the details of the unquoted and quoted names, we need to establish a foundation regarding the unqualified and qualified names.
An unqualified name is the "given" name object without the reference to its schema or catalog (e.g. "accounts"):
A qualified name consists of the qualifier (which can be a qualified name itself) and an unqualified name. For instance, a table named "accounts" contained in the "accounting" schema can be referenced as
accounting.accounts
, or even more specifically,customer1.accounting.accounts
where "customer1" is the name of the database or a catalog to which the "accounting" schema belongs.Note that a qualified name is a compound object, so it can be used as an identifier but there's no scenario where it would be used as a literal. Hence, the corresponding method will not be implemented and isn't supposed to be used.
Unquoted and quoted names
An unqualified name can be unquoted or quoted. There are two reasons why a name may need to be quoted in SQL:
The name represents an SQL keyword. The quoting may be needed for the SQL parser on a certain database platform to interpret the token as an identifier rather than a keyword.
The following query is syntactically invalid in MySQL:
But this one is valid:
The that is not fully upper-cased must retain its original case on the platforms like Oracle and IBM DB2.
In the the following query, the name will be internally converted to the upper case:
But in this one it will remain lower-cased:
Proper design and implementation of the casing aspect will allow us to quote identifiers unconditionally (i.e. always) which will remove the requirement to maintain the keywords lists and improve the stability and security aspects of the API.
Unquoted name normalization
Depending on the destination database platform, an unquoted name may need to be normalized before getting quoted. Currently, this is implemented in an obscure and non-systematic way like the following:
Instead, we can introduce an interface that would describe this concern and be implemented by the platforms as required:
The implementations of the
NameBuilder
interface will depend onUnquotedNameNormalizer
.Object sets
There is a common component missing in the Schema API that would allow implementing a set of named database objects. Currently, the
Schema
and theTable
classes have their own implementations of the set ofTable
s andColumn
s. A unified API is needed because the object names will be considered not only depending on their value but also depending on the platform in which the objects will be created.Breaking API changes
Once the above APIs are implemented and solidified, we need to transition the APIs under the
Schema
namespace to use the names expressed as objects instead of plain strings:string $name
will require anUnqualifiedName $name
in their constructor. The same applies to the objects referencing such objects (e.g. the referenced table of a foreign key).UnqualifiedName $name
.To implement that, some breaking API changes are needed around schema introspection. Specifically, in the implementation of the
AbstractSchemaManager::list*()
. See the "Reasoning behind the design changes" section in https://github.com/doctrine/dbal/pull/4548.This is where we're trying squeeze a qualified name into a string:
https://github.com/doctrine/dbal/blob/cc378c34815cf3377312888a71047b7e167f7a76/src/Schema/PostgreSQLSchemaManager.php#L185
And then parse: https://github.com/doctrine/dbal/blob/dc4ad2a0d7f72e9e55dd1f9e2a341c068fd2b89f/src/Schema/AbstractAsset.php#L47
And miserably fail: https://github.com/doctrine/dbal/pull/4708.
Get rid of
AbstractAsset