elixir-ecto / ecto

A toolkit for data mapping and language integrated query.
https://hexdocs.pm/ecto
Apache License 2.0
6.15k stars 1.43k forks source link

Proposal: Rich Extensibility for Ecto #4513

Open Moosieus opened 3 days ago

Moosieus commented 3 days ago

Proposal: Rich Extensibility for Ecto

There's an increasingly significant market share of Postgres extensions that are radically changing the way people interact with Postgres. These extensions can provide lucrative and key-enabling features without requiring applications to host and synchronize external data stores. At least two extensions in this category include:

These extensions embed their calling conventions within SQL and even compose quite well within its relational model. It's desirable to extend Ecto to incorporate these calling conventions, so developers can use these extensions with the amazing composition, caching, mapping, and other ammenities Ecto currently provides.

The following SQL query is a deliberately limit-testing example from ParadeDB:

SELECT "p0".*, "a1".*
FROM "posts" AS "p0"
JOIN "authors" AS "a1" ON "a1".id = "p0".author_id
WHERE
  "p0".id @@@ paradedb.boolean(
    must => ARRAY[
      paradedb.disjunction_max(
        disjuncts => ARRAY[
          paradedb.parse($1),
          paradedb.phrase(field => 'body', phrases => ARRAY['is', 'awesome'], slop => $2),
          paradedb.phrase(field => 'body', phrases => ARRAY['are', 'awesome'], slop => $2)
        ]
      ),
      paradedb.range(field => 'likes', range => int4range(9000, NULL, '[)')
      )
    ]
  )
LIMIT 20;

I'm leaning on ParadeBD as it's what I know -- If anyone would like to propose further examples for Apache AGE or other extensions, I highly encourage you to share them here!

A user may want to compose the above SQL query in Ecto like so:

import Ecto.Query

i_think_therefore_i_param = "body:elixir OR body:erlang"
slop = 1

query =
  from(
    p in Post,
    search: disjunction_max([
      parse(p, ^i_think_therefore_i_param),
      phrase(p.body, ["is", "awesome"], ^slop),
      phrase(p.body, ["are", "awesome"], ^slop)
    ]),
    join: a in assoc(p, :author), 
    preload: [author: a],
    limit: 20
  )

some_condition = true

query =
  if some_condition do
    search(query, [p], range(p.rating, 9000, nil, "[)"))
  else
    query
  end

Repo.all(query)

Ecto's existing fragment API can cover simple variations of these queries ad-hoc, but it doesn't provide the desired level of extensibility demonstrated above. Additionally, users may wish to augment Ecto's other modules like Ecto.Schema to provide additional information.

One example includes mapping a ParadeDB search index for a given schema/table, like so:

defmodule App.Post do
  use Ecto.Schema

  schema "posts" do
    field :title, :string
    field :body, :string
    field :likes, :integer

    belongs_to :author, App.User
    has_many :comments, App.Comment

    # Extend schema DSL to encode additional information
    search_index [
      :title, :body, {:likes, :int4}, {:author_id, :int8}, #...
    ]
  end
end

Discussion

I'd like to start a discussion on what a possible extension API for Ecto could look like. It'd ideally allow users to extend Ecto in order to support extensions like the one above, obviating the need to fork Ecto or Ecto SQL to do so.

The primary goals are to:

josevalim commented 2 days ago

Thank you @Moosieus!

Our general approach with queries is to stay closer to the database as possible, this makes it really straight-forward to go from ParadeDB docs to Ecto queries and vice-versa. So I would prefer to translate the SQL query above to:

require ParadeDB

query =
  from(
    p in Post,
    where: ParadeDB.boolean(p.id,
      must: [
        ParadeDB.disjunction_max(
          disjuncts: [
            ParadeDB.parse(^i_think_therefore_i_param),
            ParadeDB.phrase(field: 'body', phrases: ['is', 'awesome'], slop: ^slop),
            ParadeDB.phrase(field: 'body', phrases: ['are', 'awesome'], slop: ^slop)
          ]
        ),
        ParadeDB.range(field: "likes", range: int4range(9000, NULL, '[)'))
      ]
    )
  )

The only thing getting in the way of us supporting the above today is that we don't support keyword lists in Ecto queries. But if we did, we could compile them into the equivalent PostgreSQL, and you should be able to provide all of the functions above on top of fragments. WDYT?

Moosieus commented 2 days ago

Here's my thoughts:

I've asked the ParadeDB team about the above. If that works out, I can start exploring a solution on top of fragments, and see what blockers I run into (if any).

Moosieus commented 1 day ago

Got my answer on the where clauses. ParadeDB wil ultimately combine them transparently. That simplifies things for the fork I've been working on, and the tentative approach with fragments. I'm thinking to try both and see how the DX compares.