elixir-ecto / ecto

A toolkit for data mapping and language integrated query.
https://hexdocs.pm/ecto
Apache License 2.0
6.14k stars 1.43k forks source link

New Feature: Adding default `where` clauses to schema #2395

Closed zachdaniel closed 6 years ago

zachdaniel commented 6 years ago

The start of the discussion is in the google group here: https://groups.google.com/forum/#!topic/elixir-ecto/dLsumjvNJ2I

The proposal for the interface:

    schema "foo" |> where(deleted: true) do
      ...
    end

Updates would need to be made to joins, from, and other places to ensure that these where clauses are always added.

I think the way forward there is to first modify everything to use __schema__(:query), and then set it up such that this change alters that query.

josevalim commented 6 years ago

@zachdaniel I am back to thinking that the best solution here is still to use database views. You can get all of the guarantees, including checks on insert/update.

Creating a view is very straight-forward. The only annoying part is that a view is not automatically updated when you add columns to the underlying table, so we need a mechanism to make it easier to update. That can be as simple as putting the views in a separate file that is required on the migration. Some guidance around here is definitely necessary.

sorentwo commented 6 years ago

@josevalim We use views extensively to share logic between a legacy Rails app and the newer Elixir app, and to handle complex queries that aren't possible or easy with Ecto. All migrations are done on the Rails side (because the app is older and the schema_migrations table is incompatible), but that means we've been able to use scenic to handle view migrations.

Scenic does precisely what you've mentioned here:

  1. Create a separate SQL file for the view.
  2. Create a migration and reference the SQL file.
  3. Use existing views to generate a new version or to roll back.

I'm sure this could be done manually in Ecto, or with a package, but it would be nice to have support for view creation built in.

As you mention, the view isn't updated when columns are added, which bites people new to views every single time. I'm not sure if there could be a mechanism to detect when table changes would effect a view, but an automated warning would be useful.

zachdaniel commented 6 years ago

The problem with GROUP BY makes views a deal breaker for us.

josevalim commented 6 years ago

@zachdaniel which problem? That you can't define a view with group_by? That shouldn't apply to this issue though, this issue is exclusively about supporting where.

zachdaniel commented 6 years ago

I mentioned it in a comment above:

There are also some limitations when it comes to using a table vs a view. With a table, postgres knows about primary keys and allows you to only group by the primary key, even when you want additional rows that come with the record.

e.g

SELECT 
  posts.id, posts.name, posts.author_id, COUNT(comments.id) 
AS comment_count 
FROM posts
JOIN comments ON comments.post_id = posts.id
GROUP BY posts.id

That above query would work with a table, but it would not work with a view, even if the definition for that view was just SELECT * FROM posts

Basically postgres does you favors when it knows that your grouping by a primary key, but you lose that ability when it's a view.

This issue isn't really the only deal breaker, but it's an example of how views don't serve as perfect stand-ins for tables.

OvermindDL1 commented 6 years ago

Basically postgres does you favors when it knows that your grouping by a primary key, but you lose that ability when it's a view.

I would not use something so implicit anyway, explicit is always better and that doesn't look very ANSI SQL'y...

zachdaniel commented 6 years ago

But the point is that your queries become resilient to the addition of new columns. If you have a query that is selecting all of the columns of one table, and additionally grouped up data on related rows(a pattern we use all the time), then with a view you have to use GROUP BY every, fieldz, ever which has actual performance impacts, and requires you to change every query when you add a column.

OvermindDL1 commented 6 years ago

which has actual performance impacts, and requires you to change every query when you add a column.

Which in my cases would only be in one location. I use a set of builder modules for building queries (I so cannot wait for named joins whooo!).

zachdaniel commented 6 years ago

Well in our case it would be in considerably more places than one location :)

OvermindDL1 commented 6 years ago

Well in our case it would be in considerably more places than one location :)

Sounds like a redesign is really in order... ^.^;

/me cannot stand to scatter things around all over the place, prefers centralization and module/function APIs

zachdaniel commented 6 years ago

This issue is that redesign. A central set of query logic would have given us no benefit before. Now that we have a requirement for something like this, I would rather have it implemented as an ORM feature than build an interface on top of the existing query language. We have over 70k LoC and it's easy to say that a redesign is necessary without context.

zachdaniel commented 6 years ago

Additionally, @OvermindDL1 by my reading of the ANSI SQL standard for group by + the postgres documentation, it conforms to the standard just fine.

archseer commented 6 years ago

I've been bitten quite a few times by the default_scope in rails, and although I know the intention here isn't to fully replicate it; but if we provide it people will end up misusing it as such.

If materialized views aren't sufficient, what about doing it the simple way and wrapping the Repo module or using a macro? That way you could even pattern-match the query if needed, apply your filters, then delegate.

isaacsanders commented 6 years ago

Have you implemented this as a plugin, @zachdaniel?

Creating a mixin module (mixed-in after Ecto.Schema) overriding the function definition of __MODULE__.__schema__(:query) might solve this problem for you.

zachdaniel commented 6 years ago

@isaacsanders I tried originally, but the problem is that not enough ecto internals use __schema__(:query). That is bypassed often.

zachdaniel commented 6 years ago

What could potentially get us most of the way there (I know a widely used open source library needs to consider more than one use case), would be the ability to modify the query for associations. What that would mean for us, is that we can get the usage of assoc back. Currently we manually write out all the association joins fully, because we need to make sure that the joined table honors our system that we have set up to give each schema its own base query. The system we have is simple, just a behaviour that says that they define a .query(). If we could define a custom query that would be used by assoc/2 in queries like so:

many_to_many :tags, MyApp.Tag, join_through: "posts_tags", query: MyApp.Tag.query()

Then all we have to remember is to ensure that the from of our queries, or any schemas explicitly use the .query() function. So

from post in MyApp.Post.query(),
  join: post_tags in "post_tags",
  on: post_tags.post_id == post.id,
  join: tag in ^MyApp.Tag.query(),
  on: post_tags.tag_id == tag.id

Would become:

from post in MyApp.Post.query(),
  join: tag in assoc(post, :tags)

That would be very nice for us!

josevalim commented 6 years ago

@zachdaniel we would be happy with supporting this. The only limitation is that the given query can only have where clauses and the same source as the one given to the association. Aything else should fail. Would you be interested in tackling this with our guidance?

michalmuskala commented 6 years ago

It's not clear to me, though, if the MyApp.Tag.query() function would be called at runtime or compile-time. As it is right now, I would expect it to be called at compile-time looking at this code.

josevalim commented 6 years ago

@michalmuskala it would be compile-time, yes.

michalmuskala commented 6 years ago

In that case, couldn't we do:

many_to_many :tags, MyApp.Tag.query(), join_through: "posts_tags"

We could extract the struct name from the query.

josevalim commented 6 years ago

@michalmuskala oh, actually yes. The way we have structured associations internally is already meant to support things like that. :)

zachdaniel commented 6 years ago

@josevalim the limitation of only allowing where clauses is fine (especially in the short term). I do think that adapaters that support subqueries shouldn't really have an issue supporting joins to associations that have a select statement, but I can advocate for that at a later time :D

zachdaniel commented 6 years ago

Oh, and yes I'm very interested in taking this on!

zachdaniel commented 6 years ago

How do you generally test equality of queries generated via from? They have information on line numbers and things like that. I could easily nil those out, but that seems a little less than idea. Alternatives could be inspecting and asserting equal, or just asserting the fields about it that I care are equal, but I wanted to get your thoughts on it. I'm looking for other examples currently.

josevalim commented 6 years ago

@zachdaniel get the underlying code expressions and call Macro.to_string/1 on it.

zachdaniel commented 6 years ago

@josevalim for specifically this one, I defined the queries in a separate module and then used them in that in the schema, and then when fetching the assoc I can confirm they are equal via == since they both come from the same place, which works for now.

josevalim commented 6 years ago

We are closing this as we cannot agree on supporting it. There are also alternatives, such as using views, and #2472 brings this feature to associations, which simplifies things.