SeaQL / sea-schema

🌿 SQL schema definition and discovery
https://docs.rs/sea-schema
Apache License 2.0
186 stars 39 forks source link

Incorrect Discovery for Composite Foreign Keys in PostgreSQL #100

Closed julmaxi closed 8 months ago

julmaxi commented 1 year ago

Description

Postgres schema discovery produces incorrect results for composite foreign keys. Specifically, for a key with $n$ columns each key is repeated $n$ times in both columns and foreign_columns.

Steps to Reproduce

On a Postgres database:

  1. Create a schema with a composite primary key and a corresponding foreign key. The following is a minimal example:

    
    CREATE TABLE Parent (
    id1 int,
    id2 int,
    
    PRIMARY KEY (id1, id2)
    );

CREATE TABLE Child ( id INT PRIMARY KEY, parent_id1 INT, parent_id2 INT,

FOREIGN KEY (parent_id1, parent_id2) REFERENCES Parent(id1, id2)

);

2. Run schema discovery

### Expected Behavior

The references section for the example should like this:

References { name: "child_parent_id1_parent_id2_fkey", columns: [ "parent_id1", "parent_id2", ], table: "parent", foreign_columns: [ "id1", "id2", ], on_update: Some( NoAction, ), on_delete: Some( NoAction, ), }


### Actual Behavior

References { name: "child_parent_id1_parent_id2_fkey", columns: [ "parent_id1", "parent_id1", "parent_id2", "parent_id2", ], table: "parent", foreign_columns: [ "id1", "id2", "id1", "id2", ], on_update: Some( NoAction, ), on_delete: Some( NoAction, ), }


### Reproduces How Often

Always

## Versions

sea-schema=0.11.0
Postgres 14.6

## Additional Information

I am relatively confident that the root cause is this query:
https://github.com/SeaQL/sea-schema/blob/e70964453462f91c3228db81f3cdc130b27dae44/src/postgres/query/constraints/mod.rs#L134-L160

Specifically, the inner select will produce all primary keys targeted by a foreign key. Each of these will be joined with *all* referencing foreign key columns in a matching child table. This produces all $n \times n$ possible combinations of primary and foreign key columns which is of course fine when $n=1$ but not in any other case.

One potential way to fix this is to additionally query the `ordinal_position` of each column in the primary key in the inner select.
This requires joining with the `key_column_usage` table in the inner select to retrieve the primary key.
`ordinal_position` can then be matched with `position_in_unique_constraint` in the `key_column_usage` row for the foreign key.

The overall query would look like this (changes commented)

```rust
.join_subquery(
        JoinType::LeftJoin,
        Query::select()
            .distinct()
            .columns(vec![
                (Schema::ReferentialConstraints, RefC::ConstraintName),
                (Schema::ReferentialConstraints, RefC::UniqueConstraintSchema),
                (Schema::ReferentialConstraints, RefC::UniqueConstraintName),
                (Schema::ReferentialConstraints, RefC::MatchOption),
                (Schema::ReferentialConstraints, RefC::UpdateRule),
                (Schema::ReferentialConstraints, RefC::DeleteRule),
            ])
            .columns(vec![
                (Schema::ConstraintColumnUsage, Kcuf::TableName),
                (Schema::ConstraintColumnUsage, Kcuf::ColumnName),
            ])
            .columns(vec![
                // Extract the ordinal position of the referenced primary keys
                (Schema::KeyColumnUsage, Kcuf::OrdinalPosition)
            ])
            .from((Schema::Schema, Schema::ReferentialConstraints))
            .left_join(
                (Schema::Schema, Schema::ConstraintColumnUsage),
                Expr::col((Schema::ReferentialConstraints, RefC::ConstraintName))
                    .equals((Schema::ConstraintColumnUsage, Kcuf::ConstraintName)),
            )
            .left_join(
                // Join the key_column_usage rows for the referenced primary keys
                (Schema::Schema, Schema::KeyColumnUsage),
                Condition::all()
                .add(
                    Expr::col((Schema::ConstraintColumnUsage, Kcuf::ColumnName))
                    .equals((Schema::KeyColumnUsage, Kcuf::ColumnName)),
                ).add(
                    Expr::col((Schema::ReferentialConstraints, RefC::UniqueConstraintName))
                    .equals((Schema::KeyColumnUsage, Kcuf::ConstraintName))
                ).add(
                    Expr::col((Schema::ReferentialConstraints, RefC::UniqueConstraintSchema))
                    .equals((Schema::KeyColumnUsage, Kcuf::ConstraintSchema))
                )
            )
            .take(),
        rcsq.clone(),
        Condition::all().add(
            Expr::col((Schema::TableConstraints, Tcf::ConstraintName))
            .equals((rcsq.clone(), RefC::ConstraintName))
        ).add(
            // Only join when the referenced primary key position matches position_in_unique_constraint for the foreign key column
            Expr::col((Schema::KeyColumnUsage, Kcuf::PositionInUniqueConstraint))
            .equals((rcsq.clone(), Kcuf::OrdinalPosition))
        ))

However, I am not sure if I overlook any side effects of this fix.

WinLinux1028 commented 8 months ago

Ping for @julmaxi I tested your code, and It fixed this and https://github.com/SeaQL/sea-orm/issues/1690 Please send your code as pull request. If you don't do it, I'll submit a pull request on behalf of you.

julmaxi commented 8 months ago

Thanks for testing. I will create a pull request.

WinLinux1028 commented 8 months ago

@julmaxi May I send this as pull request?

bouk commented 1 week ago

I've run into a semi-related issue where I still couldn't get the code to generate correctly in a composite foreign key but the problem was that I had a foreign key that referenced something that has a unique index but not a unique constraint, also see this: https://stackoverflow.com/questions/61249732/null-values-for-referential-constraints-unique-constraint-columns-in-informati

The fix was to run code like this:

ALTER TABLE table_name
ADD CONSTRAINT constraint_name UNIQUE USING INDEX index_name;

And then the constraint is correctly referenced in the information_schema