elixir-ecto / ecto_sql

SQL-based adapters for Ecto and database migrations
https://hexdocs.pm/ecto_sql
Apache License 2.0
578 stars 312 forks source link

Using column references in `json_extract_path` doesn't work #646

Open KirillKayumov opened 3 weeks ago

KirillKayumov commented 3 weeks ago

Elixir version

1.16.2

Database and Version

PostgreSQL 14.9

Ecto Versions

ecto: 3.11.2, ecto_sql: 3.11.3

Database Adapter and Versions (postgrex, myxql, etc)

postgrex: 0.18.0

Current behavior

json_extract_path or [] variant doesn't support referencing columns from other tables. The Ecto query below fails to compile:

create table(:test_posts) do
  add(:text, :text)
  add(:user_id, references(:users))
end

create table(:test_users) do
  add(:post_stats, :jsonb)
end

from(u in TestUser, join: p in assoc(u, :posts), select: %{stat: u.post_stats[p.id]}) |> Repo.all

Expected behavior

It would be very convenient if we could pass references to other columns in json_extract_path. I'm not sure about other DBs but for PostgreSQL I can see that json_extract_path transforms to this SQL:

users.post_stats #> '{"my", "path", "inside", "json"}'

Because it uses such syntax of array (as a '{}' string) it's clear why it doesn't work today. However, PostgreSQL supports another syntax where it's possible to reference columns:

users.post_stats #> array['my', 'path', test_posts.id, 'inside', 'json']::text[]

I'm wondering if anything stops us from using the array[] syntax and allow referencing columns from other tables? If no, I can try to work on the fix.

greg-rychlewski commented 3 weeks ago

Hi,

I think there is a chance we can do this. The main things I can think of that we'd need to be careful about:

  1. Does it work for both json and jsonb. Right now json_extract_path is agnostic between those types and we probably want to keep it that way.
  2. Does it work for the same postgres versions as the array string.
  3. We do some validation in Ecto to make sure the extraction path is valid by comparing against the fields in the embedded schemas. So we probably have to make some adjustments to ignore the validation if a column is provided.
  4. Do we take any significant performance hit making the change. It seems a bit wasteful to do it even if there are no column references. So maybe we have to track whether a column was provided and change the syntax.
greg-rychlewski commented 3 weeks ago

One more thing we should probably think about is dynamic paths.

Since the query bindings are compile-time constructs I don't think we can allow them in runtime evaluated paths unless we use dynamic. We might want to evaluate how large a change that would be.

For example if it was prohibitively large then we'd need to decide if we want to allow bindings in compile-time paths only even though users would probably really want them in runtime paths.