collerek / ormar

python async orm with fastapi in mind and pydantic validation
https://collerek.github.io/ormar/
MIT License
1.67k stars 89 forks source link

[Feature request] Allow for composite primary keys, including foreign key columns as part of the PK #137

Open tillspeicher opened 3 years ago

tillspeicher commented 3 years ago

It would be quite useful to be able to use composite primary keys, i.e. primary keys consisting of multiple DB columns. Supporting this feature would mean supporting composite foreign keys as well. And ideally, in addition, composite primary keys should be able to also span foreign key columns.

Concretely, I would like to be able to define models as follows:

class User(ormar.Model):
    id: uuid.UUID = ormar.UUID(primary_key=True)
    name: str = ormar.String(max_length=100)

# Note that this model has two primary key columns, one of which is a foreign key.
class UserItem(ormar.Model):
    user: User = ormar.ForeignKey(User, primary_key=True)
    id: int = ormar.Integer(primary_key=True)
    description: str = ormar.String(max_length=200)

# Implicitly this model has three primary key columns, because the user_item foreign
# key is a composite foreign key with both the User's id and the UserItem's id.
class UserItemStatus(ormar.Model):
    user_item: UserItem = ormar.ForeignKey(UserItem, primary_key=True)
    id: int = ormar.Integer(primary_key=True)
    status: str = ormar.String(max_length=100)

Usecase: The reason for why I would like to use composite foreign keys is to combine the benefits of UUID and integer primary keys. Using the example above, UUID PKs allow me to hide how many users are registered in my system, which would be visible with integer (auto-increment) keys. But then, I would like to use integer keys for UserItem and UserItemStatus that increment on a per-user basis. I want to avoid using UUID keys for those items for reasons of column size and non-sequential data-layout, but also avoid a single integer primary key column because that would reveal how many items exist in the system when the ids are sent over an API. Using integer keys that increment on a per-user basis is ideal here, but it requires to combine them with another column into the primary key, because they would not be unique otherwise.

Is this feature something you would like to support? If yes, I could also try to implement this myself and open a PR.

SQLAlchemy has support for this feature: https://docs.sqlalchemy.org/en/14/core/constraints.html

An additional consideration: The UserItemStatus model above has a foreign key relation user_item, but ideally it should also have a foreign key relation user since user is part of the user_item foreign key. So there should be a way to define a user foreign key relationship that uses the column already present in the user_item relationship. If you are interested in supporting this feature, I can think about how to define these two relationships in an efficient way.

collerek commented 3 years ago

Composite pks should be declared in Model Meta class, otherwise the fields would have to be aware of each other and pk creation would have to be delayed after all of them are parsed (i.e. pk columns can be first and last in model) what complicates things, and is unnecessary in most common one column pks.

I'm not sure how to handle composite fks, as you can customize the db names of columns in foreign keys, and i think i don't really like the idea of creating multiple columns in the background, as with your sample there is really no way to tell apart normal and composite fks, maybe it should also be declared in Meta.

After all this is how sql in general and sqlalchemy handles the composites (you don't declare it on field but as a separate constraint).

In general I like the concept of composite pks/fks and thought about it before, if you feel like it feel free to issue a PR, but be "warned" that it's gonna be quite complicated, as pk is used in many places and a lot of refactor/change is needed. There is a reason why tortoise, peewee etc. don't support them, yet in long term I do want ormar to support this, so if you decide to tackle this issue I'm more than happy to help.

tillspeicher commented 3 years ago

Thanks for being interested in supporting this! Defining PKs and composite foreign keys in the Meta class makes sense.

I can try implementing this as a PR, but before getting started I would like to get you on board with the interface that this functionality would expose. I guess following an API similar to what SQLAlchemy uses is probably the safest bet.

Composite PK API: So for defining composite PKs, following SQLAlchemy's PK constraint, I would propose the following:

class A(ormar.Model):
    class Meta:
        database = database
        metadata = metadata
        constraints = [ormar.PrimaryKeyConstraint("id_1", "id_2")]

    id_1: uuid.UUID = ormar.UUID() # Automatically part of the PK
    id_2: int = ormar.Integer(primary_key=True) # Also part of the PK
    col_3: int = ormar.Integer(primary_key=True) # Raises a ModelDefinitionError

The PrimaryKeyConstraint here would simply take a list of all columns that should be part of the PK. Columns can still be marked as primary_key=True to support the more common single PK case, but this is optional if a PrimaryKeyConstraint is defined. Columns that set primary_key=True will cause a ModelDefinitionError if a PrimaryKeyConstraint is defined and they are not also listed as one of the columns there.

Composite FK API: Following SQLAlchemy's FK constraint this could look as follows:

class B(ormar.Model):
    class Meta:
        database = database
        metadata = metadata
        constraints = [
            ormar.PrimaryKeyConstraint("id", "a_id_1", "a_id_2"),
            ormar.ForeignKeyConstraint(
                "a", ["a_id_1", "a_id_2"], A, ["id_1", "id_2"],
            ),
        ]

    id: int = ormar.Integer(primary_key=True)
    a_id_1: uuid.UUID = ormar.UUID(primarky_key=True)
    a_id_2: int = ormar.Integer()

The ForeignKeyConstraint here would be defined as:

T = TypeVar("T", bound=ormar.Model)

class ForeignKeyConstraint(object):
    def __init__(
        self,
        name: str,
        columns: List[str],
        ref_model: T,
        ref_columns: List[str],
    ):
        # ...

And it could be used like this:

async def create_objects():
    a = await A.objects.create()
    b = await B.objects.create(a=a)
    print(b.a.id_1)
    # Note that the name defined in the foreign key constraint
    # becomes a property of B.

The name of the FK constraint could then also be used to do generate reverse relationships by appending an "s".

Does an API like this make sense to @collerek or is there something that you would like to change?

collerek commented 3 years ago

Composite PK

I think it should be clear and clearly separated:

Especially that marking a field pk=True if it's already part of the composite one doesn't actually do anything, but seeing them both used at the same time might confuse users. So the first one (id_2) should also raise exc.

Note also that in ormar pk is required so either field property or composite have to be used.

class A(ormar.Model):
    class Meta:
        database = database
        metadata = metadata
        constraints = [ormar.PrimaryKeyConstraint("id_1", "id_2")]

    id_1: uuid.UUID = ormar.UUID() # Automatically part of the PK
    id_2: int = ormar.Integer(primary_key=True) # Also exception is raised, without primary_key=True added automatically
    col_3: int = ormar.Integer(primary_key=True) # Raises a ModelDefinitionError

Composite FK

There already is ForeignKeyConstraint which is a simple dataclass, so it would have to be either renamed/ refactored as it's used only internally. The key is that you cannot resolve constraints too soon, as sqlalchemy bind them to specific table and later they fail with inheritance. That's why ormar one are needed (and those should be kept in fields definition) that are resolved only when sqlalchemy table/column is actually created. Those constraints should also be copied to child models.

To keep names more or less unified I suggest:

T = TypeVar("T", bound=ormar.Model)

class ForeignKeyConstraint(object):
    def __init__(
        self,
        to: T, # changed name and order
        columns: List[str],
        related_columns: List[str],  # changed name
        name: str = None, # this one should be optional, `ormar` already generates names of constraints, 
                          # need to add those new pk/fk constraints to the generation, 
                          # it's needed i.e. for proper alembic migrations
    ):
        # ...

Both FK & PK

Note that you can pass a name to field, that later becomes the database column name. That way you can have different name in ormar and different one in sql table.

In existing UniqueColumns you have to pass the actual database names, and I think it should be the same in all constraints. So in example that follows, note the names in PrimaryKeyConstraint. I don't remember if I already added this to UniqueColumns/ metaclass but the names used in constraint should be checked against the own model (pk) or related model (fk), and if appropriate column does not exist, ModelDefinitionError should be raised.

class A(ormar.Model):
    class Meta:
        database = database
        metadata = metadata
        constraints = [ormar.PrimaryKeyConstraint("my_pk_1", "my_pk_2")] # <- actual db names

    id_1: uuid.UUID = ormar.UUID(name="my_pk_2") # Different db names passed
    id_2: int = ormar.Integer(name="my_pk_1") 

Same with FK:

class B(ormar.Model):
    class Meta:
        database = database
        metadata = metadata
        constraints = [
            ormar.PrimaryKeyConstraint("special_pk_3", "a_id_1", "a_id_2"), # actual db names
            ormar.ForeignKeyConstraint(
                "a", ["a_id_1", "a_id_2"], A, ["my_pk_1", "my_pk_2"], # actual db names
            ),
        ]

    id: int = ormar.Integer(primary_key=True, name="special_pk_3") # db name changed
    a_id_1: uuid.UUID = ormar.UUID(primarky_key=True)
    a_id_2: int = ormar.Integer()

The name argument passed to field is stored as db_alias in BaseField definition (in the meantime the name is changed from name to alias, as in ormar name refers always to the actual name of the column in ormar which is same as in underlying pydantic model). To retrieve it use fields get_alias() method from fields and not db_alias directly, that way it can be changed if needed (which already happened before).

Note that ormar.BaseField inherits from pydantic.FieldInfo, so there is also alias property on a field, but that's pydantic one so do not use or set this one.

Each model has also helpers to translate names to aliases or aliases to names, by one or all of them at once, they are located in AliasMixin in models.mixins.

Usage

I'm not sure I follow. In example you use, a is the property of B, and id_1 is property of A. Relations are kept inside of RelationManager (separate per instance) as one time you need the Relation itself and other time the actual related model from the relation.

async def create_objects():
    a = await A.objects.create()
    b = await B.objects.create(a=a)
    print(b.a.id_1)
tillspeicher commented 3 years ago

Thank you for the detailed reply!

All the points you raised make sense to me, i.e. only allowing either a PrimaryKeyConstraint or primary_key=True, the ForeignKeyConstraint interface you suggested and using the actual database column names as well as checking that they exist on the target model.

What I wanted to show in the usage part was just that composite FK relations should be usable by their name, rather than the columns that they are defined over. So to allow B.objects.create(a=a) and not having to write B.objects.create(a_id_1=a.id_1, a_id_2=a.id_2) (though I think this should be valid as well) and then enable accessing A's id_1 column via the named relation from B.

I will go ahead and try to implement this then, though I'll have to familiarize myself a bit more with the codebase.

collerek commented 3 years ago

Ah, I just got what you meant :D

Yes, the relation should be registered with ForeignKeyConstraint.name and be accessible/settable with this name (a in example). Thinking about this I guess we should allow also setting the name of the relation. I assumed the name in the definition of FKC would be the database name of constraint itself, but this as I mentioned should be generated to allow for consistent migrations. So I propose following signature:

T = TypeVar("T", bound=ormar.Model)

class ForeignKeyConstraint(object):
    def __init__(
        self,
        to: T, 
        columns: List[str],
        related_columns: List[str],  # changed name
        name: str = None, # name of the relation, still optional, defaults to `Model.get_name(lower=True)` so if T=Truck, name=truck 
        db_name: str = None, # this one should be optional, `ormar` already generates names of constraints, 
                          # need to add those new pk/fk constraints to the generation, 
                          # it's needed i.e. for proper alembic migrations
    ):
        # ...

But note that a_id_1, a_id_2 fields will also be accessible in ormar and pydantic, which is not really possible with single columns pks, as __getattribute__ in ormar.NewBaseModel hijacks the call and directs it to RelationManager, from which the actual related model is returned.

Even if only pk value is returned from db ormar constructs the so called pk_only Model, that is the target model (to model) with only primary key populated that skipped validation of other fields (since they are gone), but you can still load() that model and it's populated from db (btw. pk_only Models is something that composite pks also need to handle).

I'm wondering if this should be allowed (access to columns of composite fk), since you can populate relation with pk value only and with dict:

class A(ormar.Model)
    ....
    a = ormar.ForeignKey(B)

b = B(id=1)
# all of them are equivalent
a = A(a=b)  # pass related model
a = A(a=1)  # pass related model pk value
a = A(a={"id": 1})  # pass related model dict

So maybe in composites you can pass model or dict (cannot pass pk value anymore, I don't like the idea of passing list/tuple), and direct access to a_id_1 and a_id_2 should be forbidden. If you want to update the relation pass new instance or dictionary with new values instead of changing parts of fk value at once.

tillspeicher commented 3 years ago

Ok, again that makes sense.

The only thing to consider wrt. accessing and setting the FK columns are overlapping FKs. What I mean is this:

class A(ormar.Model):
    class Meta:
        database = database
        metadata = metadata

    id: uuid.UUID = ormar.UUID(primary_key=True)

class B(ormar.Model):
    class Meta:
        database = database
        metadata = metadata
        constraints = [ormar.PrimaryKeyConstraint("id", "a")] # Is this fine or would "a" have to be defined as a normal
                                                              # Column with a separate FK constraint on it?

    id = ormar.Integer()
    a = ormar.ForeignKey(A) # Should this be allowed? See comment above.

class C(ormar.Model):
    class Meta:
        database = database
        metadata = metadata
        constraints = [
            ormar.PrimaryKeyConstraint("id", "a_id", "b_id),
            ormar.ForeignKeyConstraint(A, ["a_id"], ["id"], "a"),
            ormar.ForeignKeyConstraint(B, ["a_id", "b_id"], ["a", "id"], "b"),
        ]

    id = ormar.Integer()
    a_id = ormar.UUID()
    b_id = ormar.Integer()

In this case, setting the FK columns directly might make sense, i.e. c = C.objects.create(a_id=a.id, b_id=b.id). Though it should always be possible to set all the FK columns with the "largest" model, i.e. the one that has the other FK references as well, i.e. c = C.objects.create(b=b) (initializes a_id as well).

collerek commented 3 years ago

The PrimaryKeyConstraint fields should be accessible anyway as those are own model fields, and you can set them quite freely, so in B class both a and id fields should be accessible. Also the pk is changed through direct access not like with fk (through relation).

Also although pk should be either in Meta or in Fields, I think the fks can be in both.

Especially that you want to allow making fks part of the other fks and pks.

If other fk is part of the ForeignKeyConstraint I guess we should check is_relation status of the field and allow accessing relation fields. So in example like this:

class C(ormar.Model):
    class Meta:
        database = database
        metadata = metadata
        constraints = [
            ormar.PrimaryKeyConstraint("id", "a_id", "b_id),
            ormar.ForeignKeyConstraint(B, ["a", "b_id"], ["a", "id"], "b"),  # one fk consists of other fk and other fields
        ]

    id = ormar.Integer()
    a = ormar.ForeignKey(A)  # fks can be both in field and Meta
    b_id = ormar.Integer()

Since a is a separate relation and can exists without b, I guess a should be accessible. But b_id should not be accessible, since it has no meaning without b set-> without b whole relation is incorrect so b_id is None and should not be directly changed.

Or maybe we should leave it up to users, we can always restrict the use of those columns later and there might be a use case we don't think about here.

The points that matters are:

tillspeicher commented 3 years ago

Wrt. allowing access to the underlying FK columns, what I thought of was to disallow direct column access and only allow setting these columns via the FK constraint's name. I.e. in the example, a is accessible because it's actually the name of a relationship and not the column itself and b_id is not accessible because it's a column, but it can be set via the b FK constraint, e.g. c.b = b. So the way to implement it would be to intercept the __getattribute__ and __settattr__ calls on the model and if they access a column that's part of an FK constraint to block them but to let them through if there's a corresponding relation with that name registered (a corresponding FK constraint). This should take care as well of the other issues you mentioned if I'm not wrong regarding FastAPI compatibility and preventing partial FK updates.

But I'm wondering whether it would make sense to make an exception for columns that are part of the PK or to handle them in the same way. That's because setting these columns to anything other than values corresponding to a valid FK relationship would anyway be rejected by the corresponding FK constraint, so these columns might as well only be set using the FK constraints' names, as suggested above. I.e. in the example this would mean that only the id column is accessible directly because it's not part of a FK, but the a and b_id columns wouldn't be, even though they're part of the PK. They would have to be set using the a and b relations.

The most consistent approach wrt. Pydantic models if we would implement FKs that way would then probably be to not include columns that are part of FKs directly as you said but rather include the referenced models under their FK constraint's name. So the Pydantic model for C would have the fields id, a (-> A) and b (-> B) but not b_id.

One more point we should clarify: Right now models have a pk property which as far as I can see acts like a normal column, the one that's the PK column. However, what should that property be if there are composite PKs? Should it be a scalar in the case where only one PK is set and a dataclass/Pydantic model if there are multiple ones with all the PK columns? Or always a dataclass/Pydantic model? But exposing the PK columns this way would probably be against the earlier idea of only allowing access to columns that are part a FK via the FK's name. One solution might be to return a dataclass/Pydantic model but make the FK columns in it read-only.

collerek commented 3 years ago

Direct access to fields

Yeah, I was imprecise, by direct access I meant user direct access (so a.b or a.b=whatever) not that you can access the underlying column under the hood, it still goes through relation.

Note that pydantic accepts only None (if relation is optional), Model or PkOnlyModel otherwise the validation will fail, so whether you set a.b = b or a.b = {"b_id": 1} or a.b=1 it's always going through relation field (in __init__ and __setattr__/__getattribute__) and ormar instantiates an actual related Model from those values if what you pass is not already a Model (it goes by expand_relationship in ForeignKeyField where proper constructor is selected based on passed value).

So I think we talk about the same, when part of pk/fk is a relation you can access it like a relation, when other columns are part of pk, you can also set them and access them, but when other columns are part of fk you cannot access them and exception should be raised like 'Direct access to field {field_name} is not allowed, use {relation_name} to set/get the value'.

Reverse relations

Note that registration of related Models is mutual -> a.b=b causes that b.a = [a], as relations are registered on both sides. That's why in __init__ of NewBaseModel the related models are first build with expand_relationship but not registered, cause validation of the Model in which __init__ we are can still fail, and only after the validation passes the related Models are registered in RelationManager. (again with expand_relationship but now models are already instantiated, so they are simply forwarded to relationship).

Cause registration of relation and models is mutual, you also need to create the second side of the relationship (technically it happens in expand_reverse_relations in metaclass, or during parsing of ForwardRefs in NewBaseModel), so in previous sample:

class A(ormar.Model):
    class Meta:
        database = database
        metadata = metadata

    id: uuid.UUID = ormar.UUID(primary_key=True)
    # here is auto registered ForeignKey with default name bs (B.get_name()+'s') that allows access to
    # Bs from A. Now this field also have to be ForeignKeyConstraint as B's pk is multi column.

class B(ormar.Model):
    class Meta:
        database = database
        metadata = metadata
        constraints = [ormar.PrimaryKeyConstraint("id", "a")] 

    id = ormar.Integer()
    a = ormar.ForeignKey(A)

So thinking about this the signature of ForeignKeyConstraint should contain also related_name, under which the relationship will be registered .

T = TypeVar("T", bound=ormar.Model)

class ForeignKeyConstraint(object):
    def __init__(
        self,
        to: T, 
        columns: List[str],
        related_columns: List[str],  # changed name
        name: str = None, # name of the relation, still optional, defaults to `Model.get_name(lower=True)` so if T=Truck, name=truck 
        related_name: str = None,  # optional, defaults to owner model name.lower() + 's' (so if declared on model 'Car' it's 'cars')
        db_name: str = None, # this one should be optional, `ormar` already generates names of constraints, 
                          # need to add those new pk/fk constraints to the generation, 
                          # it's needed i.e. for proper alembic migrations
    ):
        # ...

Consistency with pydantic model

Note that underlying pydantic models is kept consistent with ormar model, so all fields including the relation one goes to pydantic.BaseModel.__fields__ (so ormar.Model.__fields__ as it's a subclass) as it's needed for validation.

To keep the __repr__/__str__ nice too, which is useful for debugging etc. the related models are not only registered in RelationManager but also are added/removed from __dict__ (so without intercepting __getattribute__ you could actually get them through pydantic).

Since the fields need to be there for validation anyway it's cleaner (and probably faster) than overloading __repr__/__str__ in ormar.

ManyToMany

Since you want to add composed pks, when you would declare many to many relation to such model, the ManyToMany also need to support compose pks/fks. There is also related Through model which is either generated for you or you can specify it yourself, that keeps fks to both sides of relation.

There is also ThroughField but this one explicitly excludes all relations and keeps just own fields (relations are in ManyToMany anyway and ThroughField is used to expose additional fields from user defined Through model.

pk on Model

Yes, pk right now is just an alias of the actual primary_key column, so you don't have to extract the pkname from Model.Meta all the time (which also need to handle the multi columns with this change). It's also useful in implementation details cause you don't have to getattr/setattr all the time. Since now pk can only be single column it simply forwards the call to actual field/column whatever the real name is. So there is no actual column with this name neither in pydantic nor in ormar

Now you can both set/get the value this way, and despite that under the hood this is a massive task, from user perspective you just add new functionality, that's why I would like to keep it non-breaking change if possible.

If you make it dataclass/pydantic Model it would mean you have to initialize it before setting the value which I think I'm not very fond of (and you would have to import the model/class to set it in your code?). Or maybe I'm not fully grasping your idea, but the way I get it is both breaking and not very convinient.

I think that simply adding the property and property.setter, that can accept either single value or dict would solve this, and not break the current code/tests. That would allow something like follows:

ormar.Model (or probably arther in NewBaseModel but for simplicity here):

@property
def pk(self):
    ... # return value or dict

@pk.setter
def pk(self):
    ... # accepts value or dict

class Old(ormar.Model):
    ...
    id = ormar.Integer(primary_key=True)

class New(ormar.Model):
    class Meta:
        database = database
        metadata = metadata
        constraints = [ormar.PrimaryKeyConstraint("id", "a")]

    id = ormar.Integer()
    a = ormar.ForeignKey(Old)

# access like old way
old = await Old().save()
old.pk # return 1

# set like old way
old.pk = 2 
await old.update() # now pk in db is 2

# get new way
new = New(a=old).save()
new.pk # return {"id":1, "a": old}

# set new way
second_old = await Old().save() # pk is 2 as id = Integer pk which is autoincrement by default
new.pk = {"id":2, "a": second_old} # here property.setter assign the values to corresponding columns (so they are validated this way) in Model
assert new.id == 2
assert new.a == second_old
await new.update() # now actual db values are id=2 and a=2 (pk value of second_old)
tillspeicher commented 3 years ago

These points make sense to me!

I tried to cover all the points we discussed so far through tests and started PR #138 to get going with the implementation. So far it only has those tests and no implementation yet. Please let me know if you would like to change anything there.

kokroo commented 2 years ago

@collerek Did we ever get to finally implementing composite keys in Ormar?

Mexflubber commented 1 year ago

So what happened with this feature request? What's a good work-around to mapping tables with composite primary keys?

nagaem commented 1 year ago

Has any more thought been given to this? We're stuck on version 0.11.2 because we need composite PKs.