Open bcksl opened 1 year ago
I think it would be really interesting to support this, but it is important to note that we would need to make some pretty serious changes to our expression language to support this. We'd need to support things like fragments that are only true on specific types, and we'd also lose a lot of the optimizations we might take for granted in some cases (that a given list is always homogenous on how it relates back to the thing that loaded it means we can Map.group_by(resource, :destination_attribute)
to get the related values, for example.
Its worth exploring, but we may need to start small and work our way there in a branch to tease out all the hidden issues therein.
How would these concerns be affected by first introducing the concept of resource behaviours/protocols, and allowing relationships to be polymorphic only according to a single one of those?
Could you give me an example of what you have in mind?
If we take the payments example from above, all of the App.Payment.*
options would implement a App.Payment
protocol that specifies the attributes, actions, etc. that the engine can expect to exist on each of them.
The relationship would then look like:
defmodule App.User do
relationships do
has_many :payment_methods, App.Payment
end
# ...
end
🤔 that would solve for some of it. But even then, we'd have to limit available filters to things contained in the the protocol until we figure out a way to sort of "switch on the type" in our expression language.
I think its worth exploring though and could solve the bulk of the issues though.
Absolutely. This covers a lot of the domain modeling use-cases pretty well, though. "If it's not something they all have in common, you can't filter on it," seems to be a good place to start. I'm not opposed to exploring options for type switching in expressions either, but I think it might make sense to start with limiting the scope first.
Works for me!
My initial thinking for the protocol specification would be writing a "bare" resource, with essentially the same DSL used for existing resources, and having a validator that ensures that all resources that implement it are isomorphic. Something cute to auto-register all the resources that implement a protocol so you don't have to list them would be nice as well :)
Would it then be straightforward from an engine perspective to have it ignore struct tags and just pretend each thing it gets back is an instance of that bare resource for internal purposes?
I think so? At least for most cases, and we can tackle the others one by one.
As for autmoatically registering them, honestly that kind of thing is almost universally harder than it seems like it should be (and may actually be impossible to solve all of the ergonomics issues around it). But power to you if you can find a way. Just know that others have tried and ultimately had to back down from that path.
Ok, I was thinking that there would be the introduction of an implementations
block:
implementations do
implements App.Payment
implements App.OtherFunStuff
end
And that this would essentially append those resources to the module being implemented, for purposes of constructing the tagged joins.
We can do that :) But if we want to get the list of all things that implement App.Payment
, that is a very hard problem.
Can we push it up to the registry level to scan the resources and provide the...registry? :)
Potentially, but they might be cross registry. They might even be cross API? I guess if we prevent that, then yes we can do it that way. Basically require that the api
option be passed, and that all relevant resources be part of the provided api.
Tbh I think it's probably ok most of the time to require that a protocol and all the resources that implement it are within the same registry, but I can imagine cases where it would be nice not to have the limitation. But in the same way it's required that cross-API relationships have an api
specifier now, I don't see any issue with requiring that cross-API implements
clauses have the same. How weird does that get on the engine side of things if a relationship is multiply-cross-API?
Performance optimizations can be eschewed a little bit at first, for sure. For example, I was also thinking that it might be an issue for the engine if some of the implementing resources are in different data layers. More efficient pathfinding for that kind of stuff can come later though, and we can just treat it like that whole relationship is in a different data layer and split.
In the former case with api
specifier it sounds like the implementation registry would need to be pushed up into the engine, like I guess is done with cross-API relationships.
If we start with the restriction that all implementors must be in the API's registry, then its fine. Its easy to filter a list of modules to see which ones have a property. As long as the list is static somewhere (which it is in the APIs registry).
The protocol (we should figure out what this kind of thing should be named, and really hash this aspect of it out, I'm not sure how I feel about it yet) wouldn't need to be in a registry, just the resources.
Somehow this is a combination of structs and protocols in nature, if we consider those to be close to attributes and actions, respectively. "Behaviour" is a term I've used for this kind of thing in the past. "Structure" or "specification" are also descriptive of the situation.
I think it's no problem to make this single API-only at the start, particularly if the limitation is only on implementors being in the same registry, as I can see many cases where you might want to have a cross-API relationship to such a collection of implementors, but fewer where it's a big deal to have implementors of one "thing" across multiple APIs.
Ash.Resource.Spec
could work.
Ash.Resource.Spec
sounds like a good idea potentially. There is a lot of theory involved with doing this kind of thing declaratively, to the point that I wonder if we should just make it functional.
use Ash.Resource.Spec
def complies?(resource, opts) do
...
end
and just check it functionally. It would be an easy enough starting point, and then we could provide a built in spec like HasAttribute, name: :foo_id, type: type
.
That would certainly leave things maximally open-ended with minimal code for the validation aspect, so we could get going on the engine part ASAP. The Spec
—or something—would still need to define an Ash.Resource
according to the existing Ash.Resource
DSL, so that the engine can figure out what it is allowed to do with the resource using the existing introspection methodology. We can simply have it feed implementing resources to complies?
and generate explosive error messages at runtime if you didn't implement complies?
strictly enough (;
The engine for now doesn't have to be particularly concerned about whether a given resource is compliant to a spec—it can be on you for the time being if things blow up—but it does need a way to do the things it already does re: figuring out how it can use/optimize interactions with that resource.
For a declarative validator, it's a matter of going through and figuring out which parts of the Ash.Resource
DSL are conflicting/overridable. attribute.default
might be a good example of something that is overridable, but attribute.type
would not be. Stuff such as attribute.allow_nil?
falls somewhere in the middle, but is probably only relevant to the parts of the engine that check it, so could potentially be allowed to be overridden.
This also marks all the touch points in the engine that will need to be updated if a declarative validation strategy is defined, or equally well allow for that strategy to be a validation extension that auto-implements Spec
, so I'm all for it.
I think the declarative validator is even more complex than just overridable or not. It is for basic things but gets complex for more complex applications of this kind of pattern...but we probably won't find out without just going for it. I think we would just start off only supporting attributes, and even then only supporting attributes/types.
For sure, attributes are the easy bit. I think we turn off all the other forms of optimization for these types of relationship to start.
Evaluating beforehand how complex it will become would begin with laying down the whole Ash.Resource
DSL tree and just going through it. But a lot of that same information would come from starting with complete splitting and getting the various existing optimizations that are applicable working one by one and documenting what will break if various constraints aren't satisfied. They can grow together.
I still think it's more complicated than just deciding what is/isn't mergeable, but that's besides the point. Attributes with exact matching types is good enough for now.
No doubt, rather that each optimization that you are performing has a minimum set of constraints that satisfies its assumptions, which right now is checked by the engine using introspection on an individual resource. To understand whether a resource complies?
to an implementation
declaratively, with least strictness, we would need to determine what constraints each value of each DSL item actually implies to the engine.
Doing this might require introducing new DSL to Spec
for hinting, which would then similarly need to be validated against implementors. It's entirely possible that there are optimizations that wouldn't be decidable from a "bare" resource definition as we've been discussing, in which case those need to be disabled for now.
I'd like to dig into some of the more complex compile-time optimizations that the engine is performing to get a lay of the land on what we'd be looking at going forward. Do you have an example of one of the optimizations you think would be challenging to evaluate fitness for against a declarative spec?
🤔 Honestly we do very little compile time optimization. What I'm getting at with my comments is that we're talking about introducing a generic Spec
concept, designed to ensure that some resource meets some required behavior. In basic cases, like "does it have an attribute with this type", we can define that like this:
attributes do
attribute :name, :type
end
and that is enough. We can just see if every entity/option in the spec is also present in the resource in question. But imagine a spec that wanted to say "A create action called :create
that accepts an input :foo
".
Here are two different actions that match that requirement:
create :create do
accept [:foo]
end
create :create do
accept []
argument :foo, :string
end
So what I'm saying is that the definition of a Spec
will not always mean "do X items at X position in the spec DSL and the resource DSL match". And I think that may be problematic enough (i.e this method not being theoretically sound in advanced use cases) for us to need to come up with some other methodology for expressing this concept of "adoptable constraints on a resource". There is a lot of inspiration to look to for this kind of thing (protocols/behaviors/typeclasses), but Ash has hybrid characteristics of a type system and protocols and behaivours.
This is why I think it might be better to instead just have functional specs and/or an entirely different DSL for expressing these things.
dsl
|> require_attribute(:name, :type)
|> require_action(:create, inputs: [foo: :string])
For example. I'm still not sure how we'd hook it up, because the idea is that you'd want a relationship to be given a spec and to know that the spec enforces the things that the relationship needs, i.e
has_many :things, MyApp.Specs.SomeSpecOfPolymorphicTypes do
destination_attribute :foo_id
end
we'd want the has_many
relationship to raise if the spec given doesn't enforce the requirements of the relationship.
So that might lead us to a DSL:
use Ash.Spec
require_attributes do
attribute :foo, :string
end
Wrapping this all up, though, I think the word we are looking for here is an Interface
that defines a common set of behavior/shape of resources. Perhaps I'm wrong and we can do that by just defining a resource and like... "figuring it out". I.e if the spec says:
create :create do
accept [:foo]
end
and the implementor does
create :create do
argument :foo, :string
end
then we say "yes, these match".
Nice, that matches well with what I was thinking, both the separate DSL or repurposing of Ash.Resource
's DSL.
In the case where there are two things that are sufficiently isomorphic for the engine—your example of accept
and argument
—we could for sure make it less magical that they are equivalent by explicitly saying:
action :create do
input :arg, :type
end
My thought to repurpose the Ash.Resource
DSL had primarily two goals in mind:
It seems totally reasonable to sidestep the validation part at the start, so we could start with having the engine recognize tagged relationships (_id
and _type
), and choose the correct resource accordingly for return/further steps.
👍 at this point we're probably best off just giving it a shot. The recent Union type addition could likely be used to do type generation in ash_graphql, meaning we likely won't have that much work there. Do you want to give it a shot?
Yep, I'm going to start digging into the engine.
I wanted to revisit this and take some steps towards making a concrete plan of attack.
My feeling is that, of the two options initially listed, having a polymorphic relationship add an additional attribute (*_type
by default) is probably the least invasive and most datalayer-agnostic approach.
Regardless of the datalayer, I am somewhat loathe to go the route of stringifying the module name to *_type
as I don't find this stable enough, so either we should add a new entity to the resource
block so users can specify a stable type name if using the auto-discovery approach ("protocols", "specs", complies?
), or require the user to supply a map of type name to module name when defining the relationship if doing something more explicit, akin to Ash unions.
There are a bunch of things that this touches in Ash core. Here are some of the highlights, very roughly in an order that I think would make sense to implement:
{source,destination}_attribute
.*_type
attribute.*_type
attribute wherever it would load *_id
, and create the correct struct once stuff is returned from the datalayer.*_type
when loading polymorphic relationships. Assuming core uses ResourceType |> Ash.Query.for_action(...)
to do this already, this should be reasonably straightforward, but could be complicated if there are datalayer optimizations that expect to be able to introspect chains of actions across multiple relationships boundaries up-front.__type__
field to the base resource struct definition or add __metadata__.type
.manage_relationship
needs to be updated to additionally write out *_type
when appending a relationship, read it when checking for existence etc. This is another argument in favor of simply using a string type in Postgres, which I address further below.manage_relationship
needs a default field name to switch on when being passed a map, which should be configurable in the resource DSL.manage_relationship
need to be able to additionally accept a type name for polymorphic relationships.manage_relationship
may require additional inspection to determine what it would take to make them conformant.I don't expect that this is a complete list, but hopefully a place to get started. Feel free to add anything else that comes to mind.
A large part of the initial proposal was about making this efficient for AshPostgres. Looks like that process would culminate in adding clauses here that additionally filter on *_type
: https://github.com/ash-project/ash_postgres/blob/062e67392a3299c0e2e96f1393319d1d29e3018c/lib/join.ex#L760
Haven't had time to dig into how much of the binding code would need to be updated, or any other parts of AshPostgres that might be touched. As I mentioned above, most likely wherever AshPostgres is applying the relationship's load action filter would need to be updated to switch on *_type
, probably with nested SELECT
s if it's doing chaining. Thoughts here are welcome.
However we choose to do this, the migration generator will need to be updated to add/alter the *_type
column as necessary.
Choosing a column type for *_type
is an open question. Either of the two approaches mentioned that provide stable type naming could reasonably make use of the ENUM
type, though that's more to add to the migration generator and even Ash.Type.Enum
s aren't backed by Postgres ENUM
s :D
I haven't dug into these datalayers yet to see what optimizations these are performing for joins; again, thoughts welcome.
Couple of thoughts:
we can't have the polymorphic relationship automatically add the type field because the destination resource is what is polymorphic. So, at least at first, what we'll have to do is just validate that the type
field exists.
the relationship loader is data-layer agnostic, so if we prevent filtering on and aggregating over polymorphic relationships to start, then we can actually just avoid doing anything with this in the relevant data loaders. We'd only need to do this in ash core. Ash core can validate that no polymorphic relationship is referenced in filters and/or aggregates.
AshPostgres has to join relationships for filters/aggregates, not for data loading.
Regarding (1), wouldn't we more or less be following the existing pattern, modulo adding *_type
beside *_id
in the case of polymorphism? Indeed, belongs_to
is the only one that actually implies attributes on the resource where it is defined, and it would stay that way.
has_one
, has_many
and many_to_many
would continue simply to validate that the *_type
attribute is where they expect it to be, just as for *_id
.
Regarding (2), could you expand a bit on where, e.g. prepare build(load: ...)
fits in with this? Is AshPostgres currently unable to turn this into a JOIN
?
Either way, I'm on board with limiting the scope as described at the beginning, particularly if the code to add filter and aggregate support later is mostly orthogonal.
For 1. Wouldn’t polymorphic relationships require a type on the destination resource, not the source resource?
For 2. We don’t on purpose. It is good in some cases, but generally the more scalable approach is not to join but to issue multiple queries. We can support joining and choose a heuristic for it later though, as an optimization.
In (1), yes, in the case of has_many
and many_to_many
. has_one
is a grey area, since you could store the type of the destination on the source, but I don't see a strong reason to change the existing shape. Why not keep it in line with the other two and the existing monomorphic versions?
Is there a reason that requiring *_type
be defined on the destination resource could cause issues? I'm thinking the pattern would essentially be the same as it already is, defining a belongs_to
on the destination resource, but with polymorphic? true
(or whatever DSL we choose) to imply *_type
in addition to *_id
. A polymorphic has_many
doesn't itself create the *_id
or *_type
on the destination resource, but does validate their presence.
Regarding (2), that's perfect for this case.
🤔 I think the type field should always be defined on the destination resource.
I think before we start in on this we should start in on resource interfaces.
Something like:
use Ash.Resource,
implements: [MyApp.FooType]
Then we can require that the relationship be connected to an interface that requires a type
attribute as well as the destination field.
Just to check: are we in agreement that polymorphic relationships should build upon the existing belongs_to
, has_one
, has_many
and many_to_many
relationships?
In that case, loading a polymorphic has_many
from a given source resource is a union of all the types it could possibly contain. Since the relationship loader is currently using separate queries, this means functionally we are loading the relationship for each of the possible destination types and combining the results.
The type name doesn't ever actually need to be stored in the database except for what I was assuming, which is that the relationship may be bidirectionally polymorphic, in which case the type name of the source in a has_many
would be stored on the destination. This is the reason I was mentioning adding a *_type
attribute alongside *_id
for belongs_to
. Possibly this was not very clear from the description above.
The primary reason the type name needs to be defined on the destination resource in a has_many
is so that manage_relationship
can accept a map with a type field or a (type, id)
tuple.
Since many_to_many
is just has_many
through a pivot under the covers, the same philosophy applies.
has_one
is the single case where we might consider having the type on the source resource, enabling the loader to perform only a single lookup. As far as I know, has_one
doesn't actually enforce that the candidate set is singular, but makes it so simply by taking the first matching result (or nil
if there are none). This means that the current behaviour is essentially a has_many
as well, limited to at most one result.
A polymorphic belongs_to
is the most straightforward; we clearly need a *_type
attribute in addition to *_id
so we know from which resource we should be looking for the specified id.
bidirectionally polymorphic
I think you're going to need to provide some concrete examples of the different kinds of things you're thinking here.
But yes, I can see a case for having the type on the source resource for has_one
and belongs_to
relationships.
As to wether or not they should be built on the existing relationship types, I hadn't actually considered the idea of adding different relationship types, i.e polymorphic_has_one ...
. Things are going to be hairy however we do it, and the fact that polymorphic relationships won't have a single destination might actually make this a very good idea. Adding them as new relationship types would likely prevent a lot of confusion down the road where things are expecting regular relationships but get polymorphic ones. We could even pick one and start with it, i.e polymorphic_belongs_to
could be step one, and have it only support being loaded.
Sure, with bidirectionally polymorphic I'm referring to the case where there is polymorphism on both sides of the relationship. Take a case of road-hauling cargo: we have a Truck
and several classes of Goods
that the Truck.contains
. This relationship is unidirectionally polymorphic; a Truck
may :contain
many types of Goods
, but all of those goods are :contained_by
a Truck
.
In this case, Truck.contains
is a polymorphic has_many
, but Goods.contained_by
is a monomorphic belongs_to
, and does not require a truck_type
attribute—we know that the truck_id
will always reference a Truck
.
Supposing we want to upgrade the fleet to have a more varied set of road vehicles, there are still benefits to having them all be Truck
s, simply with varying cargo capacities, wheels, etc., all of which might be combined into a Truck.type
. This is acceptable, because these vehicles behave largely the same.
If the operation grows, and we wish to start carrying freight by air and by sea, it may become burdensome to try to treat these modes of transport as variations of Truck
. In this case, we may have Aircraft
and Ship
, which share characteristics in common with Truck
: they are all modes of Transport
, and they all :contain
Goods
, but now those Goods
are :contained_by
a Transport
, which could be any of Truck
, Aircraft
or Ship
.
Now Goods.contained_by
becomes a polymorphic belongs_to
, and requires a transport_type
in addition to a transport_id
in order to load the relationship—from either side.
I'm not opposed to having poly_many_to_many
, poly_has_many
etc., if you think that would make it clearer to users than something like polymorphic? true
. It might make the separation in implementation a bit clearer as well. That said, I don't see a huge difference here since I think we'd be sticking with the same four relationship types unless you see a compelling reason to do otherwise.
Note that, in the example, the has_many
(Transport.contains
) also needs to be made aware of whether the belongs_to
that it is pointing at is polymorphic in the reverse direction (from Goods.contained_by
), as it will need to know whether or not to supply its own type as transport_type
when loading. Assuming bidirectional polymorphism avoids this.
I see what you mean. Its going to be an interesting journey for sure 😆
I think, for now, step one is to make them new kinds of relationships. They are different enough structurally just solely due to the fact that relationship.destination
is not available in the same way. So to begin I'd pick one, say polymorphic_belongs_to
or belongs_to_polymorphic
or something like that and start at the beginning to get it working. There are a lot of places that will need to be updated. Most of them we can avoid by validating in certain places that relationships aren't polymorphic i.e in calculation dependencies, in filter references, that kind of thing. I'll help with those. In the short term, I'd suggest just trying to get loading the relationship working :) The internals of reading and loading are very complex, and although we have plans to change it to something much simpler soon, I wouldn't let that deter you from making progress.
Ok, had a chance to start work on this today. Let's move the discussion of specifics over to https://github.com/ash-project/ash/pull/661.
My feeling is that, of the two options initially listed, having a polymorphic relationship add an additional attribute (
*_type
by default) is probably the least invasive and most datalayer-agnostic approach.
This is what Rails did and it was "fine".
Regardless of the datalayer, I am somewhat loathe to go the route of stringifying the module name to
*_type
as I don't find this stable enough, so either we should add a new entity to theresource
block so users can specify a stable type name if using the auto-discovery approach ("protocols", "specs",complies?
), or require the user to supply a map of type name to module name when defining the relationship if doing something more explicit, akin to Ash unions.
There's already short_name
which could be used for this purpose, however I'm okay with storing the module name in there as long as we:
I can't think of any way that someone could execute arbitrary code via this mechanism but hopefully 1 and 2 above make this impossible.
I've had polymorphic relationships on my wishlist, and looked into the
Ash.Type.Union
+calculation
approach and thought maybe things could go a bit deeper. Realistically, this would combine well with resource behaviors, which is a topic unto itself.In terms of an efficient
AshPostgres
implementation, it would not be dissimilar to existing relationships, but, in addition topayment_method_id
, there would also be an enumpayment_method_type
to describe the possible types.load
operations follow this pattern:The same transformation is applicable to
many_to_many
relationships, including those that are bidirectionally polymorphic.Elixir is quite a good candidate for this kind of thing, because it is type-agnostic enough to allow you to return heterogeneous lists, but makes it easy to work with them by pattern-matching the struct. More than that, first-class support for polymorphism would be a huge boon to describing a lot of domain models.
Nested loads must be handled with care internally (resource behaviors addresses this), but the engine can complain if it doesn't have enough information to determine whether it can perform a further
load
on the result types, or different loads could be specified for different types, but this starts to get messy quickly.A less-efficient (non-
join
ed) approach when the relationship is polymorphic might be a first stage strategy.In any case, I wanted to open the discussion on this and see what immediate challenges and benefits come to mind.
Other data-layers
Alternative
join
approachIf desired, an alternative approach is to use a column for each type:
Which would require a
constraint check
(forallow_nil?: true
, this would be1 >=
):