elixir-ecto / ecto_sql

SQL-based adapters for Ecto and database migrations
https://hexdocs.pm/ecto_sql
Apache License 2.0
578 stars 312 forks source link

Support for subqueries in order_bys and group_bys #607

Closed zachdaniel closed 6 months ago

zachdaniel commented 6 months ago

These are the corresponding ecto_sql changes for functionality added in a separate pull request: https://github.com/elixir-ecto/ecto/pull/4417

zachdaniel commented 6 months ago

Can't say for sure, but that build error for the mysql integration test does not appear related to my changes/may be transient. I can't retry failed steps though.

greg-rychlewski commented 6 months ago

something is happening with the MySQL CI...it seems like the MySQL server is not starting...will try to figure it out

greg-rychlewski commented 6 months ago

it's not your change though. looks like it's been happening for 4-5 commits

warmwaffles commented 6 months ago

looks like it's been happening for 4-5 commits

Flapping integration tests, say it isn't so. 😭

greg-rychlewski commented 6 months ago

i don't believe it's any code change though because the commit it started happening doesn't alter this. so probably the docker stuff we are pulling is not fixed and we got an update that is broken with our methods

josevalim commented 6 months ago

We should probably add a unit test for the other databases.

zachdaniel commented 6 months ago

Okay, so, I've added tests for the other databases. Was (of course) a great idea and highlights some issues.

Two main points

  1. I haven't set up SQL server locally before, so I can't actually test that these queries work. is there a different test I'm not seeing that will actually run the tests?

  2. ecto_sql is actually (currently) building subqueries incorrectly when used in this context. The question here is: should I instead just make mysql raise an error and we can fix the query generation later? Or should I attempt to fix the subquery building as a part of this PR? There are failing tests indicating the incorrect query building currently.

josevalim commented 6 months ago

What do you mean by “in this context”? Does it generate broken queries today? Or you mean in the additions of this PR? If it is in this PR, we should fix it now. If broken today, we should probably fix that before. Thanks :)

zachdaniel commented 6 months ago

Sorry for being unclear. Building subqueries in the context of ORDER BY and DISTINCT must be done slightly differently (apparently). I'm not a mysql expert, but, I guess it has a problem with doubly nested parenthesis.

i.e

  1) test windows window with subquery (Ecto.Adapters.MyXQLTest)
     test/ecto/adapters/myxql_test.exs:1075
     Assertion with == failed
     code:  assert all(query) ==
              ~s"SELECT s0.`x` FROM `schema` AS s0 WINDOW `w` AS (ORDER BY exists(SELECT ss0.`x` AS `result` FROM `schema` AS ss0 WHERE (ss0.`x` = s0.`x`)))"
     left:  "SELECT s0.`x` FROM `schema` AS s0 WINDOW `w` AS (ORDER BY exists((SELECT ss0.`x` AS `result` FROM `schema` AS ss0 WHERE (ss0.`x` = s0.`x`))))"
     right: "SELECT s0.`x` FROM `schema` AS s0 WINDOW `w` AS (ORDER BY exists(SELECT ss0.`x` AS `result` FROM `schema` AS ss0 WHERE (ss0.`x` = s0.`x`)))"
zachdaniel commented 6 months ago

I originally thought the problem was more serious but this may actually be easily fixable so I will take a look right now :)

zachdaniel commented 6 months ago

Okay, I've updated it and now it produces valid subqueries, but not wrapping "top level" subqueries in parenthesis

zachdaniel commented 6 months ago

I actually can't find the code that handles exists queries for mysql? But the queries it generates now work for me.

josevalim commented 6 months ago

I dont think it needs to handle it explicitly? Exists is a regular function call and then just make sure to handle sub queries as their appear in the AST.

zachdaniel commented 6 months ago

Ah, I see, thats this code. With my {:top_level, ...} modification, that resolves the issue of exists((subquery))

    defp expr({fun, _, args}, sources, query) when is_atom(fun) and is_list(args) do
      {modifier, args} =
        case args do
          [rest, :distinct] -> {"DISTINCT ", [rest]}
          _ -> {[], args}
        end

      case handle_call(fun, length(args)) do
        {:binary_op, op} ->
          [left, right] = args
          [op_to_binary(left, sources, query), op | op_to_binary(right, sources, query)]

        {:fun, fun} ->
          [fun, ?(, modifier, Enum.map_intersperse(args, ", ", &expr({:top_level, &1}, sources, query)), ?)]
      end
    end
greg-rychlewski commented 6 months ago

it would probably be a good idea to have integration tests so that we can verify the SQL we are generating is correct

zachdaniel commented 6 months ago

All done 👍

zachdaniel commented 6 months ago

@greg-rychlewski trying to figure out the best place for that kind of test. Where should I put it?

greg-rychlewski commented 6 months ago

we could put it here https://github.com/elixir-ecto/ecto_sql/blob/master/integration_test/sql/subquery.exs.

if there is an adapter that doesn't allow a certain thing we are doing here (like doesn't allow subqueries in group by or something like that) we could add a tag.

then the tag would be filtered out here https://github.com/elixir-ecto/ecto_sql/blob/master/integration_test/pg/test_helper.exs#L106 (and a similar place for mysql/tds)

zachdaniel commented 6 months ago

Well, adding the integration tests has helped flesh out the fact that you just can't do this with mssql :) subqueries are apparently only supported in the where clause. Where should I handle this? Is it kosher to raise an error in Ecto.Adapters.Tds.Connection?

greg-rychlewski commented 6 months ago

Yeah that would be best since the error from TDS might not be too clear. Thanks!

zachdaniel commented 6 months ago

It is indeed not clear 😆

zachdaniel commented 6 months ago

Okay, I think this should be ready now :)

greg-rychlewski commented 6 months ago

Sorry to be annoying. For the last TDS change, since we know we only added subqueries for a few new things, could we not catch them in places like this and raise there:

defp group_by(%{group_bys: group_bys} = query, sources) do
      [
        " GROUP BY "
        | Enum.map_intersperse(group_bys, ", ", fn %ByExpr{expr: expr} ->
            Enum.map_intersperse(expr, ", ", &top_level_expr(&1, sources, query))
          end)
      ]
    end

That would be a smaller change than tagging where the expression came from all the way down the recursive calls. And I believe we just need it for group_by and order_by because distinct is already raising

zachdaniel commented 6 months ago

The subqueries could be detected, but they could also be anywhere in an expression. So I'd have to crawl the expression to check for subqueries. i.e a and exists(...) or (b and exists(...))

greg-rychlewski commented 6 months ago

Oh I see what you mean. TBH I'm not sure if raising in the adapter is worth keeping track of all the locations as the code evolves. It might not be so bad to just let the database raise because it should at least show you the bad query, similar to if you were using a CLI tool to interact with it.

@josevalim what do you think.

zachdaniel commented 6 months ago

The error is pretty tough, you basically get something like syntax error at or near exists(

greg-rychlewski commented 6 months ago

I see what you mean. My opinion might be too unforgiving so we should definitely see what Jose thinks too.

In general I think if you are shown the query Ecto generated and the error from the database then it's pretty fair. It's what you would get if you tried to write the query by hand in the CLI client.

And I think it's important people understand how to make the query by hand in their database before trying to do something in Ecto. Ecto should really just be a translator and not like an excuse not to understand the database.

Databases also evolve and I'm not sure it's feasible for Ecto to keep up with all of those rules instead of letting the database do its thing, given there were no sins committed before the query string was made.

zachdaniel commented 6 months ago

I'm all good undoing that change, just wanted to make sure the options were clear. I leave it up to you two to decide :)

josevalim commented 6 months ago

I agree with @greg-rychlewski! I appreciate the extra mile you went here @zachdaniel but I think we can keep the code simpler and we don't have to check if the subquery is valid. If the user put it there, we assume it is valid, and if it is not, the database will let them know it isn't. :)

zachdaniel commented 6 months ago

No worries! Only took a few minutes :) Will back it out.

zachdaniel commented 6 months ago

Alright, made that change. One thing I also did, however, was remove the tests for that particular thing as well. I can add them back in, but I think they kind of send the wrong message, since they are testing known-to-be-not-valid expressions. I'm happy to add them back in though :D

zachdaniel commented 6 months ago

I think the test failure there is unrelated to my changes.

greg-rychlewski commented 6 months ago

Thanks for the changes! Nice work