Open tillspeicher opened 3 years ago
Composite pks should be declared in Model Meta class, otherwise the fields would have to be aware of each other and pk creation would have to be delayed after all of them are parsed (i.e. pk columns can be first and last in model) what complicates things, and is unnecessary in most common one column pks.
I'm not sure how to handle composite fks, as you can customize the db names of columns in foreign keys, and i think i don't really like the idea of creating multiple columns in the background, as with your sample there is really no way to tell apart normal and composite fks, maybe it should also be declared in Meta.
After all this is how sql in general and sqlalchemy handles the composites (you don't declare it on field but as a separate constraint).
In general I like the concept of composite pks/fks and thought about it before, if you feel like it feel free to issue a PR, but be "warned" that it's gonna be quite complicated, as pk is used in many places and a lot of refactor/change is needed. There is a reason why tortoise, peewee etc. don't support them, yet in long term I do want ormar to support this, so if you decide to tackle this issue I'm more than happy to help.
Thanks for being interested in supporting this! Defining PKs and composite foreign keys in the Meta class makes sense.
I can try implementing this as a PR, but before getting started I would like to get you on board with the interface that this functionality would expose. I guess following an API similar to what SQLAlchemy uses is probably the safest bet.
Composite PK API: So for defining composite PKs, following SQLAlchemy's PK constraint, I would propose the following:
class A(ormar.Model):
class Meta:
database = database
metadata = metadata
constraints = [ormar.PrimaryKeyConstraint("id_1", "id_2")]
id_1: uuid.UUID = ormar.UUID() # Automatically part of the PK
id_2: int = ormar.Integer(primary_key=True) # Also part of the PK
col_3: int = ormar.Integer(primary_key=True) # Raises a ModelDefinitionError
The PrimaryKeyConstraint
here would simply take a list of all columns that should be part of the PK. Columns can still be marked as primary_key=True
to support the more common single PK case, but this is optional if a PrimaryKeyConstraint
is defined.
Columns that set primary_key=True
will cause a ModelDefinitionError
if a PrimaryKeyConstraint
is defined and they are not also listed as one of the columns there.
Composite FK API: Following SQLAlchemy's FK constraint this could look as follows:
class B(ormar.Model):
class Meta:
database = database
metadata = metadata
constraints = [
ormar.PrimaryKeyConstraint("id", "a_id_1", "a_id_2"),
ormar.ForeignKeyConstraint(
"a", ["a_id_1", "a_id_2"], A, ["id_1", "id_2"],
),
]
id: int = ormar.Integer(primary_key=True)
a_id_1: uuid.UUID = ormar.UUID(primarky_key=True)
a_id_2: int = ormar.Integer()
The ForeignKeyConstraint
here would be defined as:
T = TypeVar("T", bound=ormar.Model)
class ForeignKeyConstraint(object):
def __init__(
self,
name: str,
columns: List[str],
ref_model: T,
ref_columns: List[str],
):
# ...
And it could be used like this:
async def create_objects():
a = await A.objects.create()
b = await B.objects.create(a=a)
print(b.a.id_1)
# Note that the name defined in the foreign key constraint
# becomes a property of B.
The name of the FK constraint could then also be used to do generate reverse relationships by appending an "s".
Does an API like this make sense to @collerek or is there something that you would like to change?
I think it should be clear and clearly separated:
primary_key=True
, constaraints
, and they should be mutually exclusive. Especially that marking a field pk=True
if it's already part of the composite one doesn't actually do anything, but seeing them both used at the same time might confuse users. So the first one (id_2
) should also raise exc.
Note also that in ormar
pk is required so either field property or composite have to be used.
class A(ormar.Model):
class Meta:
database = database
metadata = metadata
constraints = [ormar.PrimaryKeyConstraint("id_1", "id_2")]
id_1: uuid.UUID = ormar.UUID() # Automatically part of the PK
id_2: int = ormar.Integer(primary_key=True) # Also exception is raised, without primary_key=True added automatically
col_3: int = ormar.Integer(primary_key=True) # Raises a ModelDefinitionError
There already is ForeignKeyConstraint
which is a simple dataclass
, so it would have to be either renamed/ refactored as it's used only internally. The key is that you cannot resolve constraints too soon, as sqlalchemy
bind them to specific table and later they fail with inheritance. That's why ormar
one are needed (and those should be kept in fields definition) that are resolved only when sqlalchemy
table/column is actually created. Those constraints should also be copied to child models.
To keep names more or less unified I suggest:
T = TypeVar("T", bound=ormar.Model)
class ForeignKeyConstraint(object):
def __init__(
self,
to: T, # changed name and order
columns: List[str],
related_columns: List[str], # changed name
name: str = None, # this one should be optional, `ormar` already generates names of constraints,
# need to add those new pk/fk constraints to the generation,
# it's needed i.e. for proper alembic migrations
):
# ...
Note that you can pass a name
to field, that later becomes the database column name. That way you can have different name in ormar
and different one in sql table.
In existing UniqueColumns
you have to pass the actual database names, and I think it should be the same in all constraints. So in example that follows, note the names in PrimaryKeyConstraint
. I don't remember if I already added this to UniqueColumns
/ metaclass but the names used in constraint should be checked against the own model (pk) or related model (fk), and if appropriate column does not exist, ModelDefinitionError
should be raised.
class A(ormar.Model):
class Meta:
database = database
metadata = metadata
constraints = [ormar.PrimaryKeyConstraint("my_pk_1", "my_pk_2")] # <- actual db names
id_1: uuid.UUID = ormar.UUID(name="my_pk_2") # Different db names passed
id_2: int = ormar.Integer(name="my_pk_1")
Same with FK:
class B(ormar.Model):
class Meta:
database = database
metadata = metadata
constraints = [
ormar.PrimaryKeyConstraint("special_pk_3", "a_id_1", "a_id_2"), # actual db names
ormar.ForeignKeyConstraint(
"a", ["a_id_1", "a_id_2"], A, ["my_pk_1", "my_pk_2"], # actual db names
),
]
id: int = ormar.Integer(primary_key=True, name="special_pk_3") # db name changed
a_id_1: uuid.UUID = ormar.UUID(primarky_key=True)
a_id_2: int = ormar.Integer()
The name
argument passed to field is stored as db_alias
in BaseField definition (in the meantime the name is changed from name
to alias
, as in ormar
name refers always to the actual name of the column in ormar
which is same as in underlying pydantic
model). To retrieve it use fields get_alias()
method from fields and not db_alias
directly, that way it can be changed if needed (which already happened before).
Note that ormar.BaseField
inherits from pydantic.FieldInfo
, so there is also alias
property on a field, but that's pydantic
one so do not use or set this one.
Each model has also helpers to translate names to aliases or aliases to names, by one or all of them at once, they are located in AliasMixin
in models.mixins.
I'm not sure I follow.
In example you use, a
is the property of B
, and id_1
is property of A
.
Relations are kept inside of RelationManager
(separate per instance) as one time you need the Relation
itself and other time the actual related model from the relation.
async def create_objects():
a = await A.objects.create()
b = await B.objects.create(a=a)
print(b.a.id_1)
Thank you for the detailed reply!
All the points you raised make sense to me, i.e. only allowing either a PrimaryKeyConstraint
or primary_key=True
, the ForeignKeyConstraint
interface you suggested and using the actual database column names as well as checking that they exist on the target model.
What I wanted to show in the usage part was just that composite FK relations should be usable by their name, rather than the columns that they are defined over. So to allow B.objects.create(a=a)
and not having to write B.objects.create(a_id_1=a.id_1, a_id_2=a.id_2)
(though I think this should be valid as well) and then enable accessing A
's id_1
column via the named relation from B
.
I will go ahead and try to implement this then, though I'll have to familiarize myself a bit more with the codebase.
Ah, I just got what you meant :D
Yes, the relation should be registered with ForeignKeyConstraint.name
and be accessible/settable with this name (a
in example). Thinking about this I guess we should allow also setting the name of the relation. I assumed the name
in the definition of FKC would be the database name of constraint itself, but this as I mentioned should be generated to allow for consistent migrations. So I propose following signature:
T = TypeVar("T", bound=ormar.Model)
class ForeignKeyConstraint(object):
def __init__(
self,
to: T,
columns: List[str],
related_columns: List[str], # changed name
name: str = None, # name of the relation, still optional, defaults to `Model.get_name(lower=True)` so if T=Truck, name=truck
db_name: str = None, # this one should be optional, `ormar` already generates names of constraints,
# need to add those new pk/fk constraints to the generation,
# it's needed i.e. for proper alembic migrations
):
# ...
But note that a_id_1, a_id_2
fields will also be accessible in ormar
and pydantic
, which is not really possible with single columns pks, as __getattribute__
in ormar.NewBaseModel
hijacks the call and directs it to RelationManager
, from which the actual related model is returned.
Even if only pk value is returned from db ormar
constructs the so called pk_only Model, that is the target model (to
model) with only primary key populated that skipped validation of other fields (since they are gone), but you can still load()
that model and it's populated from db (btw. pk_only Models is something that composite pks also need to handle).
I'm wondering if this should be allowed (access to columns of composite fk), since you can populate relation with pk value only and with dict:
class A(ormar.Model)
....
a = ormar.ForeignKey(B)
b = B(id=1)
# all of them are equivalent
a = A(a=b) # pass related model
a = A(a=1) # pass related model pk value
a = A(a={"id": 1}) # pass related model dict
So maybe in composites you can pass model or dict (cannot pass pk value anymore, I don't like the idea of passing list/tuple), and direct access to a_id_1
and a_id_2
should be forbidden. If you want to update the relation pass new instance or dictionary with new values instead of changing parts of fk value at once.
Ok, again that makes sense.
The only thing to consider wrt. accessing and setting the FK columns are overlapping FKs. What I mean is this:
class A(ormar.Model):
class Meta:
database = database
metadata = metadata
id: uuid.UUID = ormar.UUID(primary_key=True)
class B(ormar.Model):
class Meta:
database = database
metadata = metadata
constraints = [ormar.PrimaryKeyConstraint("id", "a")] # Is this fine or would "a" have to be defined as a normal
# Column with a separate FK constraint on it?
id = ormar.Integer()
a = ormar.ForeignKey(A) # Should this be allowed? See comment above.
class C(ormar.Model):
class Meta:
database = database
metadata = metadata
constraints = [
ormar.PrimaryKeyConstraint("id", "a_id", "b_id),
ormar.ForeignKeyConstraint(A, ["a_id"], ["id"], "a"),
ormar.ForeignKeyConstraint(B, ["a_id", "b_id"], ["a", "id"], "b"),
]
id = ormar.Integer()
a_id = ormar.UUID()
b_id = ormar.Integer()
In this case, setting the FK columns directly might make sense, i.e. c = C.objects.create(a_id=a.id, b_id=b.id)
.
Though it should always be possible to set all the FK columns with the "largest" model, i.e. the one that has the other FK references as well, i.e. c = C.objects.create(b=b)
(initializes a_id
as well).
The PrimaryKeyConstraint
fields should be accessible anyway as those are own model fields, and you can set them quite freely, so in B class both a
and id
fields should be accessible. Also the pk is changed through direct access not like with fk (through relation).
Also although pk should be either in Meta or in Fields, I think the fks can be in both.
Especially that you want to allow making fks part of the other fks and pks.
If other fk is part of the ForeignKeyConstraint
I guess we should check is_relation
status of the field and allow accessing relation fields.
So in example like this:
class C(ormar.Model):
class Meta:
database = database
metadata = metadata
constraints = [
ormar.PrimaryKeyConstraint("id", "a_id", "b_id),
ormar.ForeignKeyConstraint(B, ["a", "b_id"], ["a", "id"], "b"), # one fk consists of other fk and other fields
]
id = ormar.Integer()
a = ormar.ForeignKey(A) # fks can be both in field and Meta
b_id = ormar.Integer()
Since a
is a separate relation and can exists without b
, I guess a
should be accessible. But b_id
should not be accessible, since it has no meaning without b
set-> without b
whole relation is incorrect so b_id
is None
and should not be directly changed.
Or maybe we should leave it up to users, we can always restrict the use of those columns later and there might be a use case we don't think about here.
The points that matters are:
dict()
-> should fields like b_id
be present there, or only a
and b
? pydantic
do not support relation between models from several columns (kind of like composite fk). Right now the column in dict becomes the actual nested related model, and the same should be with a
and b
. This needs to be tested with fastapi too as during the request process fastapi dicts/parses model several times, that's why it may be best to exclude those fields, yet it would require some tests. And if we decide to exclude them, then probably they should be also excluded from setting/accessing them directly.__repr__
and __str__
representation (same as above), also schema() is related with a signature for openapi.ormar
in several points checks the existence of pk value to fail early without db access so I guess each model would need property/helper method to check if all (or one as before) pk fields are set.Wrt. allowing access to the underlying FK columns, what I thought of was to disallow direct column access and only allow setting these columns via the FK constraint's name. I.e. in the example, a
is accessible because it's actually the name of a relationship and not the column itself and b_id
is not accessible because it's a column, but it can be set via the b
FK constraint, e.g. c.b = b
.
So the way to implement it would be to intercept the __getattribute__
and __settattr__
calls on the model and if they access a column that's part of an FK constraint to block them but to let them through if there's a corresponding relation with that name registered (a corresponding FK constraint).
This should take care as well of the other issues you mentioned if I'm not wrong regarding FastAPI compatibility and preventing partial FK updates.
But I'm wondering whether it would make sense to make an exception for columns that are part of the PK or to handle them in the same way. That's because setting these columns to anything other than values corresponding to a valid FK relationship would anyway be rejected by the corresponding FK constraint, so these columns might as well only be set using the FK constraints' names, as suggested above. I.e. in the example this would mean that only the id
column is accessible directly because it's not part of a FK, but the a
and b_id
columns wouldn't be, even though they're part of the PK. They would have to be set using the a
and b
relations.
The most consistent approach wrt. Pydantic models if we would implement FKs that way would then probably be to not include columns that are part of FKs directly as you said but rather include the referenced models under their FK constraint's name.
So the Pydantic model for C
would have the fields id
, a
(-> A
) and b
(-> B
) but not b_id
.
One more point we should clarify:
Right now models have a pk
property which as far as I can see acts like a normal column, the one that's the PK column. However, what should that property be if there are composite PKs?
Should it be a scalar in the case where only one PK is set and a dataclass
/Pydantic model if there are multiple ones with all the PK columns? Or always a dataclass
/Pydantic model? But exposing the PK columns this way would probably be against the earlier idea of only allowing access to columns that are part a FK via the FK's name.
One solution might be to return a dataclass
/Pydantic model but make the FK columns in it read-only.
Yeah, I was imprecise, by direct access I meant user direct access (so a.b or a.b=whatever
) not that you can access the underlying column under the hood, it still goes through relation.
Note that pydantic
accepts only None
(if relation is optional), Model
or PkOnlyModel
otherwise the validation will fail, so whether you set a.b = b
or a.b = {"b_id": 1}
or a.b=1
it's always going through relation field (in __init__
and __setattr__/__getattribute__
) and ormar
instantiates an actual related Model from those values if what you pass is not already a Model (it goes by expand_relationship
in ForeignKeyField
where proper constructor is selected based on passed value).
So I think we talk about the same, when part of pk/fk is a relation you can access it like a relation, when other columns are part of pk, you can also set them and access them, but when other columns are part of fk you cannot access them and exception should be raised like 'Direct access to field {field_name} is not allowed, use {relation_name} to set/get the value'
.
Note that registration of related Models is mutual -> a.b=b
causes that b.a = [a]
, as relations are registered on both sides. That's why in __init__
of NewBaseModel
the related models are first build with expand_relationship
but not registered, cause validation of the Model in which __init__
we are can still fail, and only after the validation passes the related Models are registered in RelationManager
. (again with expand_relationship
but now models are already instantiated, so they are simply forwarded to relationship).
Cause registration of relation and models is mutual, you also need to create the second side of the relationship (technically it happens in expand_reverse_relations
in metaclass, or during parsing of ForwardRefs
in NewBaseModel
), so in previous sample:
class A(ormar.Model):
class Meta:
database = database
metadata = metadata
id: uuid.UUID = ormar.UUID(primary_key=True)
# here is auto registered ForeignKey with default name bs (B.get_name()+'s') that allows access to
# Bs from A. Now this field also have to be ForeignKeyConstraint as B's pk is multi column.
class B(ormar.Model):
class Meta:
database = database
metadata = metadata
constraints = [ormar.PrimaryKeyConstraint("id", "a")]
id = ormar.Integer()
a = ormar.ForeignKey(A)
So thinking about this the signature of ForeignKeyConstraint
should contain also related_name
, under which the relationship will be registered .
T = TypeVar("T", bound=ormar.Model)
class ForeignKeyConstraint(object):
def __init__(
self,
to: T,
columns: List[str],
related_columns: List[str], # changed name
name: str = None, # name of the relation, still optional, defaults to `Model.get_name(lower=True)` so if T=Truck, name=truck
related_name: str = None, # optional, defaults to owner model name.lower() + 's' (so if declared on model 'Car' it's 'cars')
db_name: str = None, # this one should be optional, `ormar` already generates names of constraints,
# need to add those new pk/fk constraints to the generation,
# it's needed i.e. for proper alembic migrations
):
# ...
Note that underlying pydantic
models is kept consistent with ormar
model, so all fields including the relation one goes to pydantic.BaseModel.__fields__
(so ormar.Model.__fields__
as it's a subclass) as it's needed for validation.
To keep the __repr__/__str__
nice too, which is useful for debugging etc. the related models are not only registered in RelationManager
but also are added/removed from __dict__
(so without intercepting __getattribute__
you could actually get them through pydantic
).
Since the fields need to be there for validation anyway it's cleaner (and probably faster) than overloading __repr__/__str__
in ormar
.
Since you want to add composed pks, when you would declare many to many relation to such model, the ManyToMany
also need to support compose pks/fks. There is also related Through
model which is either generated for you or you can specify it yourself, that keeps fks to both sides of relation.
There is also ThroughField
but this one explicitly excludes all relations and keeps just own fields (relations are in ManyToMany
anyway and ThroughField
is used to expose additional fields from user defined Through
model.
pk
on ModelYes, pk
right now is just an alias of the actual primary_key
column, so you don't have to extract the pkname
from Model.Meta
all the time (which also need to handle the multi columns with this change). It's also useful in implementation details cause you don't have to getattr/setattr
all the time. Since now pk
can only be single column it simply forwards the call to actual field/column whatever the real name is. So there is no actual column with this name neither in pydantic
nor in ormar
Now you can both set/get the value this way, and despite that under the hood this is a massive task, from user perspective you just add new functionality, that's why I would like to keep it non-breaking change if possible.
If you make it dataclass/pydantic Model it would mean you have to initialize it before setting the value which I think I'm not very fond of (and you would have to import the model/class to set it in your code?). Or maybe I'm not fully grasping your idea, but the way I get it is both breaking and not very convinient.
I think that simply adding the property
and property.setter
, that can accept either single value or dict would solve this, and not break the current code/tests. That would allow something like follows:
ormar.Model (or probably arther in NewBaseModel but for simplicity here):
@property
def pk(self):
... # return value or dict
@pk.setter
def pk(self):
... # accepts value or dict
class Old(ormar.Model):
...
id = ormar.Integer(primary_key=True)
class New(ormar.Model):
class Meta:
database = database
metadata = metadata
constraints = [ormar.PrimaryKeyConstraint("id", "a")]
id = ormar.Integer()
a = ormar.ForeignKey(Old)
# access like old way
old = await Old().save()
old.pk # return 1
# set like old way
old.pk = 2
await old.update() # now pk in db is 2
# get new way
new = New(a=old).save()
new.pk # return {"id":1, "a": old}
# set new way
second_old = await Old().save() # pk is 2 as id = Integer pk which is autoincrement by default
new.pk = {"id":2, "a": second_old} # here property.setter assign the values to corresponding columns (so they are validated this way) in Model
assert new.id == 2
assert new.a == second_old
await new.update() # now actual db values are id=2 and a=2 (pk value of second_old)
These points make sense to me!
I tried to cover all the points we discussed so far through tests and started PR #138 to get going with the implementation. So far it only has those tests and no implementation yet. Please let me know if you would like to change anything there.
@collerek Did we ever get to finally implementing composite keys in Ormar?
So what happened with this feature request? What's a good work-around to mapping tables with composite primary keys?
Has any more thought been given to this? We're stuck on version 0.11.2 because we need composite PKs.
It would be quite useful to be able to use composite primary keys, i.e. primary keys consisting of multiple DB columns. Supporting this feature would mean supporting composite foreign keys as well. And ideally, in addition, composite primary keys should be able to also span foreign key columns.
Concretely, I would like to be able to define models as follows:
Usecase: The reason for why I would like to use composite foreign keys is to combine the benefits of UUID and integer primary keys. Using the example above, UUID PKs allow me to hide how many users are registered in my system, which would be visible with integer (auto-increment) keys. But then, I would like to use integer keys for
UserItem
andUserItemStatus
that increment on a per-user basis. I want to avoid using UUID keys for those items for reasons of column size and non-sequential data-layout, but also avoid a single integer primary key column because that would reveal how many items exist in the system when the ids are sent over an API. Using integer keys that increment on a per-user basis is ideal here, but it requires to combine them with another column into the primary key, because they would not be unique otherwise.Is this feature something you would like to support? If yes, I could also try to implement this myself and open a PR.
SQLAlchemy has support for this feature: https://docs.sqlalchemy.org/en/14/core/constraints.html
An additional consideration: The
UserItemStatus
model above has a foreign key relationuser_item
, but ideally it should also have a foreign key relationuser
sinceuser
is part of theuser_item
foreign key. So there should be a way to define auser
foreign key relationship that uses the column already present in theuser_item
relationship. If you are interested in supporting this feature, I can think about how to define these two relationships in an efficient way.