FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.23k stars 212 forks source link

Foreign keys - Pointer based [CORE2548] #2958

Open firebird-automations opened 15 years ago

firebird-automations commented 15 years ago

Submitted by: sqlguru (sqlguru)

The relational model has no place for surrogate keys. When you use natural keys as primary keys, sometimes it may require one or more natural keys. Any relational tables must also have the same natural keys for foreign keys. This increases maintenance time and decreases performance.

The solution is to have pointer-based foreign keys. When the DBA creates a PK on Table A, they should be able to tell Firebird to "relate" Table B to Table A. Firebird would then automatically create a pointer between the two tables (the foreign keys do not have to be carried over manually by the DBA). Anytime the PK of Table A changes (column added, datatype change, datalength change), the changes would be propagated to ALL foreign keys automatically.

Cascade updates should also be done pointer based so it doesn't cause massive updates if a natural key value changes etc.

firebird-automations commented 15 years ago
Modified by: Sean Leyne (seanleyne) priority: Critical \[ 2 \] =\> Minor \[ 4 \]
firebird-automations commented 15 years ago

Commented by: @mrotteveel

I don't want to make this into a purity discussion, but surrogate keys DO have a place in the application of the relation model. And in essence your proposal is the introduction of HIDDEN surrogate keys, which IMHO is worse.

Firebird should not implement functionality that deviates in a big way from the SQL standards and as far as I know none of the standards define a solution like you mention.

firebird-automations commented 15 years ago

Commented by: sqlguru (sqlguru)

That's not what I'm suggesting. This is more of an "administration" thing. Basically, when you have 3-4 candidate keys for primary keys, firebird should automatically put them in the foreign tables etc. Any changes to the primary key would reflect on the foreign tables automatically. It makes management easier.

firebird-automations commented 15 years ago

Commented by: @asfernandes

That's a field for tools, where user can do changes and may know of what will happen under the hood.

For datatype changes, there is already domains.

firebird-automations commented 15 years ago

Commented by: @mrotteveel

It is not an 'administration' thing, it is a significant deviation of the SQL standard and IMHO does not give any new benefits.

Using pointers would result in (hidden) performance degradation since the server would need to join (hidden) to the source table to get the data from the primary key, also using pointers seems almost like a throwback to the hierarchical or network models. Also changes in primary key are already reflected in foreign keys automatically with on update cascade, only then it is explicit (and you also have other options like restrict, set null et).

If you want simple keys, use surrogate keys, there is nothing wrong with using them, and the performance degradation of getting the data of the actual (candidate) key by joining is explicit.

firebird-automations commented 15 years ago

Commented by: sqlguru (sqlguru)

You misunderstood me.

What I mean is that the foreign key is physically there in the foreign key table, it's just not "editable" by the DBA. Firebird instead manages it. Managing in the sense that if the DBA modified the primary key, it replicates the changes to all the foreign keys in the relation tables. The advantage is that the DBA doesn't manually have to replicate the changes which can be a very tedious process.

Regarding surrogate keys, you can read Chris Date, Codd, or Celko's works on why surrogates are not in the relational model.

firebird-automations commented 15 years ago

Commented by: sqlguru (sqlguru)

This is what expert on the relational model, Celko had to say:

"Oh, other SQLs handle references with pointer chain (with all the safety we learned to put into Network DBs) under the covers. The referenced value is in one physical locations, so CASCADE is cheap. SQL Server is one of the older RDBMS architectures that repeat the values in both the referenced and referencing tables."

BTW, just so you know the SQL language itself is non-relational. If you want an example of relational language, look at Codd's alpha language, Date's Tutorial D and look at REAL implementations of RDBMS (currently experimental) like dataphor and rel.

firebird-automations commented 15 years ago

Commented by: @asfernandes

Sorry, but I still didn't understand what you want.

Is it something like Oracle clusters?

From they docs: "Use the CREATE CLUSTER statement to create a cluster. A cluster is a schema object that contains data from one or more tables, all of which have one or more columns in common. Oracle Database stores together all the rows from all the tables that share the same cluster key."

firebird-automations commented 15 years ago

Commented by: sqlguru (sqlguru)

When designing according to the relational model, I want to use natural keys for primary keys. Some primary keys will contain 3-4 candidate keys. The problem with this is that the 3-4 candidate keys will have to be replicate for foreign keys to all relation tables. If I ever decide to add another candidate key to the primary key, I'd manually have to add the candidate key to all the foreign tables. This can be a very tedious process when you have many foreign tables.

The solution to this is to have a "pointer" for management purposes. So if you do add an extra column to the primary key, Firebird automatically replicates the change to all foreign tables.

There is a particular database product, according to celko, that successfully implements pointers. The result is that the foreign keys are not added to the foreign tables, instead it's pointer based. Cascades are "instant" and maintenance of the primary key is very easy.

Let me show you an example of the problem (look at how forum_name, forum_created has to be carried over to the other tables, it would be easier if it was just a pointer):

CREATE TABLE forums ( forum_name VARCHAR(25), forum_created DATETIME, forum_topics INT, forum_views INT, PRIMARY KEY (forum_name, forum_created) );

CREATE TABLE topics ( topic_name VARCHAR(25), topic_created DATETIME, forum_name VARCHAR(25), -- Firebird should make this a pointer forum_created DATETIME,-- Firebird should make this a pointer topic_posts INT, topic_views INT PRIMARY KEY (topic_name, topic_created), FOREIGN KEY (forum_name, forum_created) REFERENCES forums (forum_name, forum_created) )

CREATE TABLE forum_subscriptions ( member_username VARCHAR(25) forum_name VARCHAR(25),-- Firebird should make this a pointer forum_created DATETIME-- Firebird should make this a pointer PRIMARY KEY (member_username, forum_name, forum_created), FOREIGN KEY (forum_name, forum_created) REFERENCES forums (forum_name, forum_created) )

firebird-automations commented 15 years ago

Commented by: sqlguru (sqlguru)

To clarify this: In RDBMS, when you want to relate a table to another table, you must duplicate the PK of the referenced table as foreign keys in the referencing table. Why do we have to do this? This is something an RDBMS should "maintain internally". This creates a massive maintenance issue when we decide to use natural keys for PK where a PK can involve 3-4 candidate keys.

There is absolutely no point to duplicate the FK in the referencing tables. SQL queries should not be affected because the FK is there in the referencing tables "internally".

The current SQL queries will be unaffected by this modification but the performance benefit will definetly be notable. All cascades would be "instant".

You also add to the SQL: SELECT * FROM [referencing] AS a INNER JOIN [referenced] AS b USING designated_pk

This is where Firebird should separate itself from fake RDBMS.

ANSI committee member and world renowned SQL expert Joe Celko speaks on the matter:

"In the Sybase model, a PK value appears only once in a base table. FK references to it are proper pointer chains to that base table. They do at the engine level what you are faking manually in code with those IDENTITY columns. I change one value, in one table and it cascades thru the entire schema instantly. My only worry is that an application might have read the old value, kept it in local program storage and then tried to use it because the application allowed dirty reads."

He is talking about the "new sybase" architecture, not the old sybase architecture that fake RDBMS like SQL Server is using.