bitemyapp / esqueleto

New home of Esqueleto, please file issues so we can get things caught up!
BSD 3-Clause "New" or "Revised" License
370 stars 107 forks source link

Ambiguous column reference when using union result in subquery/CTE with persistent-postgresql #299

Closed parsonsmatt closed 1 year ago

parsonsmatt commented 2 years ago

@mauriciofierrom posted this issue in persistent, but I think it belongs here.

Copying:

Description

With the persistent-postgresql library, when using the result of a union_ in a subquery/CTE, the generated SQL repeats the "v" alias, resulting in an error about an ambiguous column reference. The generated SQL for the persistent-sqlite backend worked fine.

Envrironment

Here's the gist with the result of running the relevant commands listed in the issue template.

Script

Here's the script that results in the exception.

If considered relevant I'd like to work on this and if that would be OK, any guidance would be highly appreciated.

parsonsmatt commented 2 years ago

@mauriciofierrom can you post the generated SQL from sqlite and postgresql?

mauriciofierrom commented 2 years ago

@parsonsmatt sorry for the confusion! Thanks for copying it!

The generated SQL for postgresql:

WITH "cte" AS
(
  SELECT "u"."v", SUM("u"."v2") AS "v", SUM("u"."v3") AS "v2"
  FROM
  (
    SELECT "something"."conditional" AS "v", COUNT(*) AS "v2", SUM("something"."number" * "something"."multiplier") AS "v3"
      FROM "something"
     WHERE ("something"."criteria" = ?) AND (NOT ("something"."conditional" IS NULL) AND NOT ("something"."conditional" = ?))
  GROUP BY "something"."conditional"

  UNION

    SELECT "related"."conditional" AS "v", COUNT(*) AS "v2", SUM("something2"."number" * "something2"."multiplier") AS "v3"
      FROM "something" AS "something2" INNER JOIN "related" ON "something2"."id" = "related"."something_id"
     WHERE ("something2"."criteria" = ?) AND ("something2"."conditional" IS NULL)
  GROUP BY "related"."conditional"
  ) AS "u"
  GROUP BY "u"."v"
)
SELECT "cte"."v", "cte"."v", "cte"."v2"
FROM "cte";

The generated SQL for SQLite:

WITH cte AS (
  SELECT u.v, SUM(u.v2) AS v, SUM(u.v3) AS v2
  FROM (
    SELECT something.conditional AS v,
           COUNT(*) AS v2,
           SUM(something.number * something.multiplier) AS v3
      FROM something
     WHERE (something.criteria = ?) AND (NOT (something.conditional IS NULL) AND NOT (something.conditional = ?))
  GROUP BY something.conditional

  UNION

    SELECT related.conditional AS v,
           COUNT(*) AS v2,
           SUM(something2.number * something2.multiplier) AS v3
      FROM       something AS something2
      INNER JOIN related ON something2.id = related.something_id
     WHERE (something2.criteria = ?) AND (something2.conditional IS NULL)
  GROUP BY related.conditional
  ) AS u
  GROUP BY u.v
) SELECT cte.v, cte.v, cte.v2 FROM cte;
parsonsmatt commented 2 years ago

Ok, cool, so it's the same generated SQL, but sqlite is more permissive about what it is accepting.

Hmm! This is interesting. I'm guessing the thing-to-do here is in the call to union_ to grab the IdentState and ensure that the second query carries on from the end of the first query. This happens in Database.Esqueleto.Experimental.From.SqlSetOperation:

instance (ToSqlSetOperation a c, ToSqlSetOperation b c, res ~ SqlSetOperation c)
  => Union_ (a -> b -> res) where
    union_ = mkSetOperation " UNION "

-- | Helper function for defining set operations
-- @since 3.5.0.0
mkSetOperation :: (ToSqlSetOperation a a', ToSqlSetOperation b a')
               => TLB.Builder -> a -> b -> SqlSetOperation a'
mkSetOperation operation lhs rhs = SqlSetOperation $ \p -> do
    (leftValue, leftClause) <- unSqlSetOperation (toSqlSetOperation lhs) p
    (_, rightClause) <- unSqlSetOperation (toSqlSetOperation rhs) p
    pure (leftValue, \info -> leftClause info <> (operation, mempty) <> rightClause info)

instance (SqlSelect a r, ToAlias a, ToAliasReference a) => ToSqlSetOperation (SqlQuery a) a where
    toSqlSetOperation subquery =
        SqlSetOperation $ \p -> do
            (ret, sideData) <- Q $ W.censor (\_ -> mempty) $ W.listen $ unQ subquery
            state <- Q $ lift S.get
            aliasedValue <- toAlias ret
            Q $ lift $ S.put state
            let aliasedQuery = Q $ W.WriterT $ pure (aliasedValue, sideData)
            let p' =
                  case p of
                    Parens -> Parens
                    Never ->
                      if (sdLimitClause sideData) /= mempty
                          || length (sdOrderByClause sideData) > 0 then
                        Parens
                      else
                        Never
            pure (aliasedValue, \info -> first (parensM p') $ toRawSql SELECT info aliasedQuery)

I suspect this last bit of code is the source of the error.

belevy commented 2 years ago

I think I explicitly wanted it to repeat. Just remove the call to get/put and it shouldnt repeat.

belevy commented 2 years ago

That said, this is a different code path. You are looking at union in general rather than the recursive CTE path.

belevy commented 2 years ago

Actually nevermind I misread the query. Carry on.

belevy commented 2 years ago

So the issue appears to be selecting from a subquery that returns an aliased value isn't re-aliasing it to ensure consistent naming.

EDIT I think union is a red herring and this can be reproduced without it.

dsh commented 1 year ago

When I hit this bug removing the union eliminated the ambiguous column error, so I think the union probably is required.