Polymorphic relationships

bcksl commented 1 year ago

I've had polymorphic relationships on my wishlist, and looked into the Ash.Type.Union + calculation approach and thought maybe things could go a bit deeper. Realistically, this would combine well with resource behaviors, which is a topic unto itself.

defmodule App.User do
  @payment_method_types [
    App.Payment.Ideal
    App.Payment.Swift
    App.Payment.GooglePay
    App.Payment.ApplePay
    App.Payment.Stripe
  ]

  relationships do
    has_many :payment_methods, @payment_method_types
  end

  # ...
end

defmodule App.Payment.Ideal do
  relationships do
    belongs_to :user, App.User
  end

  # ...
end

In terms of an efficient AshPostgres implementation, it would not be dissimilar to existing relationships, but, in addition to payment_method_id, there would also be an enum payment_method_type to describe the possible types. load operations follow this pattern:

select * from users u
left join payment_ideal pi
  on u.payment_method_type = 'App.Payment.Ideal' and u.id = pi.user_id
left join payment_swift ps
  on u.payment_method_type = 'App.Payment.Swift' and u.id = ps.user_id
-- you get the idea
where u.id = 'a-nice-uuid'
   or u.id = 'or-this-one'
   or u.id = 'even-that-one';

The same transformation is applicable to many_to_many relationships, including those that are bidirectionally polymorphic.

Elixir is quite a good candidate for this kind of thing, because it is type-agnostic enough to allow you to return heterogeneous lists, but makes it easy to work with them by pattern-matching the struct. More than that, first-class support for polymorphism would be a huge boon to describing a lot of domain models.

Nested loads must be handled with care internally (resource behaviors addresses this), but the engine can complain if it doesn't have enough information to determine whether it can perform a further load on the result types, or different loads could be specified for different types, but this starts to get messy quickly.

A less-efficient (non-joined) approach when the relationship is polymorphic might be a first stage strategy.

In any case, I wanted to open the discussion on this and see what immediate challenges and benefits come to mind.

Other data-layers

ETS doesn't support native joins anyways.
Mnesia already supports 2-tuple identifiers which fit exactly the purpose.
In general, when crossing data-layer boundaries the operations would need to be split anyways.

Alternative `join` approach

If desired, an alternative approach is to use a column for each type:

select * from users u
left join payment_ideal pi
  on u.payment_method_ideal_id = pi.user_id
left join payment_swift pi
  on u.payment_method_swift_id = pi.user_id
--- etc.

Which would require a constraint check (for allow_nil?: true, this would be 1 >=):

alter table users
add constraint check_one_payment_not_null check
(
  1 =
  (
    case when payment_method_ideal_id is null then 0 else 1 end
  + case when payment_method_swift_id is null then 0 else 1 end
  -- etc.
  )
)

zachdaniel commented 1 year ago

Very against the idea of storing module names in the db. Just had a thought: what if we don’t do polymorphism at the relationship level? But instead expand all of the required internals. For example, we could add a calculation that resolves to a union, and add a handler for union inputs for manage_relationship that kind of thing. So if you had a polymorphic belongs to it would add one belongs to under the hood for each relationship, and synthesize the operations on top of it. Then we may not need to add any additional relationship types, but instead increase the functionality available to things like calculations and add changes to handle union inputs.

bcksl commented 1 year ago

There's already short_name which could be used for this purpose, however I'm okay with storing the module name in there as long as we...

Although stringifying the module name benefits from guaranteed global uniqueness, I would prefer to decouple the module name from the stored type string. This has benefits for module refactoring, connecting legacy data, hinting for the migration generator etc.

I can't think of any way that someone could execute arbitrary code via this mechanism but hopefully 1 and 2 above make this impossible.

I don't see any data or RCE vulns either way, since this section of the query is built entirely by Ash, and isn't really exposed.

The one place I would see potential issues is if we're directly atomizing a stringified module name from the database, which is a bad idea.

Assuming we support the current functionality of filters etc. later these would simply be duplicated for every subquery.

If we were to add a types filter for polymorphic relationships, this would similarly only narrow the set of types from the relationship as an intersection operation (and/or subtraction operation, if we allow not :type), but still wouldn't allow specifying additional arbitrary types, so no data safety issue.

bcksl commented 1 year ago

So if you had a polymorphic belongs to it would add one belongs to under the hood for each relationship, and synthesize the operations on top of it.

This is essentially the alternative solution from the end of the original post. It's workable—and it's actually not mutually exclusive to the *_type solution, whichever road we decide to go down first.

I will say that one major advantage we get from using *_type is removing a lot of migrations that would be necessary with a logical monomorphic relationship for each type.

From a performance perspective, I suppose that a new union calculation type could be optimized by datalayers such as AshPostgres essentially the same way that a new relationship type could. Since I believe many calculations can be turned into subqueries by AshPostgres, maybe this will work even for chains of polymorphic relationships?

From an ergonomics perspective I have some concerns. I suppose we could add sugar on top of the proposed improvements to alleviate some/all of these, but would you mind writing up how you envision the resulting DSL to eventually look for the end user?

zachdaniel commented 1 year ago

I will say that one major advantage we get from using *_type is removing a lot of migrations that would be necessary with a logical monomorphic relationship for each type.

True, but at the same time we haven't really talked about the guarantees that people would lose without doing it that way (AFAIK there is no way to make a foreign key setup to handle this).

From a performance perspective, I suppose that a new union calculation type could be optimized by datalayers such as AshPostgres essentially the same way that a new relationship type could. Since I believe many calculations can be turned into subqueries by AshPostgres, maybe this will work even for chains of polymorphic relationships?

Yeah, still would need to figure out the details, but we have the capability to do things like that.

From an ergonomics perspective I have some concerns. I suppose we could add sugar on top of the proposed improvements to alleviate some/all of these, but would you mind writing up how you envision the resulting DSL to eventually look for the end user?

It could look the same as we have it currently TBH

relationships do
  polymorphic_belongs_to :.... # expands to multiple regular belongs_to relationships under the hood, that we synthesize.
end

Another note: If we integrate a union operation into Ash core/the data layer concept, we could leverage that under the hood potentially?

zachdaniel commented 1 year ago

This is a very sticky problem. I'm down to make it work, but its going to take quite some time to iron out all the details.

bcksl commented 1 year ago

Agreed, for SQL datalayers specifically, we have much stronger data integrity checks with the column-per-type strategy using foreign key and the additional "exactly one"/"at most one" constraints outlined at the bottom of the original post.

Regardless of whether we decide to promote poly_belongs_to to a new relationship type or as belongs_to under the covers, there are tradeoffs.

Considering that these are the only two standard strategies for polymorphism in relational design that don't require the addition of intermediary tables, would you find it reasonable to eventually implement both and allow the user to choose based on the tradeoffs?

For the column-per-type strategy, do you foresee any issues with reifying the list of types in time to perform the transformation to multiple belongs_to relationships if we decide to add auto-discovery of some kind and the concrete types are not specified directly in the relationship definition?

zachdaniel commented 1 year ago

I could see a case for being able to choose between the two, yeah. If we do auto discovery, that would likely cause compile time issues, yeah. We'd need the user to keep a module somewhere separate that describes the options, the way we do Ash.Type.Enum and Ash.Type.NewType today.

bcksl commented 1 year ago

I'm 100% on board with adding a union calculation type, regardless of whether it is the foundation of some/all of the polymorphic relationships or not.

As you folks are moving towards rewriting the Ash engine using Reactor, I suspect adding the *_type attribute-based polymorphic relationship strategies as well as a bunch of other things on the roadmap, such as parameterized relationships will become significantly more straightforward. The same applies to potentially making the column-per-type polymorphism strategy a native relationship type in the future.

In the meantime, if you think it might be reasonable to add a union calculation, we could try to build the column-per-type polymorphic relationships based on that. I guess the union calculation needs to support adding tags with the appropriate value for each subquery?

zachdaniel commented 1 year ago

FWIW, we have the Ash.Type.Union already, and building a calculation that produces a union in that way is actually relatively straightforward today.

defmodule MyApp.UnionOfThings do
  use Ash.Type.NewType, subtype_of: :union, constraints: [
    types: [
      one: [
        type: ResourceOne,
        tag: :type,
        tag_value: "one"
      ],
      two: [
        type: ResourceTwo,
        tag: :type,
        tag_value: "two"
      ]
    ]
  ]
end

defmodule UnionCalculation do
  use Ash.Calculation

  @relationships [:one, :two]

  def load(query, _, _) do
    @relationships
  end

  def calculate(records, _, _) do
    records
    |> Enum.map(fn record -> 
      Enum.find_value(@relationships, fn relationship -> 
        case Map.get(record, relationship) do
          nil -> nil
          value -> %Ash.Union{type: relationship, value: value}
        end
      end)
    end)
  end
end

zachdaniel commented 1 year ago

There is a small bit more you typically want to do if you are using AshGraphql, but otherwise that right there is essentially enough.

bcksl commented 1 year ago

Great, super straightforward. Even fleshing that out, we basically just have:

defmodule Ash.Calculation.BelongsToUnion do
  @opts_schema [
    [
      name: :relationships
      type: {:list, :atom}
      required: true
    ]
  ]

  @impl Ash.Calculation
  def init(opts \\ []), do: Spark.OptionsHelpers.validate(opts, @opts_schema)

  @impl Ash.Calculation
  def load(_query, opts, _context) do
    opts[:relationships]
  end

  @impl Ash.Calculation
  def calculate(records, opts, _args) do
    records
    |> Enum.map(fn record -> 
      Enum.find_value(opts[:relationships], fn relationship -> 
        if Map.get(record, relationship) do
          %Ash.Union{type: relationship, value: value}
        end
      end)
    end)
  end
end

poly_belongs_to can desugar into:

An Ash.Calculation.BelongsToUnion;
The individual Ash.Resource.Relationships.BelongsTo for each of the types specified;
A generated constraint of the form described by the original post for Postgres;
A generated Ash.Type.NewType, subtype_of: :union.

Even ergonomics-wise the user can specify before?/1 for any transformers that would like to add/manipulate poly_* relationships, which was one of my major concerns for doing things this way. Might be worth considering adding some helpers, a :generated tag to the monomorphic relationships' DSLs, or something, to assist in determining whether a relationship is stand-alone or generated.

This gets us to reading, and for writing manage_relationship needs to be extended to recognize poly_belongs_to relationships and do the right thing. Users can still do whatever they want as far as reading from- and writing to the underlying belongs_to relationships etc.

I'll note that manage_relationship needs to enforce the at-most-one constraint as well, i.e. it needs to clear an existing value of another type if there is one when writing to the relationship.

Is there actually more that needs to go into this for poly_belongs_to?

If not, then the calculations for poly_has_{one,many} are nearly the same as Ash.Calculation.BelongsToUnion, except for the fact that they are performing a union of lists rather than single related records, and Ash.Calculation.HasOneUnion would need to re-perform the sort, if specified, outside the datalayer.

We completely avoid the need for manage_relationship to care whether the relationship is bidirectionally polymorphic or not with the column-per-type strategy for poly_belongs_to. We do not, however, avoid the need for manage_relationship to perform the same kind of constraint enforcement when writing to a poly_belongs_to through any of the other polymorphic relationship types, which is another argument in favour of the helpers/DSL tag mentioned above.

zachdaniel commented 1 year ago

We could even just tell people that managed relationships don't work with polymorphic relationships, and that they need to roll their own inputs and handlers.

bcksl commented 1 year ago

I suppose, but surely being able to both read and write using the tools people are used to is a big part of the draw. Do you see some major points of friction for making manage_relationship understand union input and the poly_* relationships?

zachdaniel commented 1 year ago

No issues necessarily, just might be a place to start. I.e read only unions first.

bcksl commented 1 year ago

Right on. I'll start with that and see how we go.

I will say that I would really like there to be some helpers a la strip_metadata that would be able to turn a list of poly_* Ash.Unions from/to the underlying structs rather than a map with a :type key, and some options for code_interface and Ash.Api to perform these transformations automatically.

Given frontends and some internals already understand Ash.Union, the value in basing things on that for everything else is pretty clear.

zachdaniel commented 1 year ago

Honestly I wouldn't suggest trying to automatically (or even w/ an option) pull things out of the union struct TBH. I think it is much better to force consumers to handle the fact that this is a union (and could therefore be one of many different types).

On Mon, Jul 31, 2023 at 8:44 PM, bcksl < @.*** > wrote:

Right on. I'll start with that and see how we go.

I will say that I would really like there to be some helpers a la stripmetadata that would be able to turn a list of poly* Ash.Union s from/to the underlying structs rather than a map with a :type key, and some options for code_interface and Ash.Api to perform these transformations automatically.

Given frontends and some internals already understand Ash.Union , the value in basing things on that for everything else is pretty clear.

— Reply to this email directly, view it on GitHub ( https://github.com/ash-project/ash/issues/563#issuecomment-1659399205 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ABLVBYYSGTZ3KK6FOXPEYDTXTBGPZANCNFSM6AAAAAAW5Y3Q6Y ). You are receiving this because you commented. Message ID: <ash-project/ash/issues/563/1659399205 @ github. com>

bcksl commented 1 year ago

Ah, could you could elaborate a bit on that? From what I understood, Ash.Type.Union was developed as a convenience for embedded types and Ash.Union is still

@type ‰__MODULE__{
  type: term(),
  value: term()
}

This is obviously a better structure if types can be things like :string or :integer as well, but if this is a relationship they're all guaranteed to be resource instances.

zachdaniel commented 1 year ago

Ash.Type.Union produces an %Ash.Union{}. In general, to support this, we'd need to support a general "calculation union unwrapping" of some kind (because we don't know what calculations are producing unions that can only be resources), which in general I think leads down a path of hiding important information. It's the same reason we retain %Ash.CiString{value: "SoMe StRiNG"}, so that context isn't thrown away.

Regardless, utilities to unwrap these values could be added later, or could be added into someone's app as a tool, like read() |> MyApp.unwrap_unions() |> ...

ash-project / ash