coleifer / peewee

a small, expressive orm -- supports postgresql, mysql, sqlite and cockroachdb
http://docs.peewee-orm.com/
MIT License
11.1k stars 1.37k forks source link

[HELP] how to union_all subquerys #1699

Closed megachweng closed 6 years ago

megachweng commented 6 years ago
class Wtlog(BaseModel):
    api = CharField(column_name='api_id', constraints=[SQL("DEFAULT ''")])
    ctime = DateTimeField(constraints=[SQL("DEFAULT current_timestamp()")], null=True)
    new_name = CharField(null=True)
    old_name = CharField(null=True)

The raw SQL is

SELECT api_id,id,new_name AS name
FROM ( SELECT * FROM wtlog WHERE checked = 0 ORDER BY id DESC ) x GROUP BY api_id 

UNION ALL

SELECT api_id,id,new_name AS name
FROM ( SELECT * FROM wtlog WHERE checked = 0 ORDER BY id ) x GROUP BY api_id 

ORDER BY api_id, id

really have no idea how to solve this

coleifer commented 6 years ago

I'm not clear why you are writing your queries like that...but here's how you might do it with Peewee:

LA1 = Log.alias()
subq1 = LA1.select(LA1).where(LA1.checked == 0).order_by(LA1.id.desc())

lhs = (Log
       .select(subq1.c.api_id, subq1.c.id, subq1.c.new_name.alias('name'))
       .from_(subq1)
       .group_by(subq1.c.api_id))

LA2 = Log.alias()
subq2 = LA2.select(LA2).where(LA2.checked == 0).order_by(LA2.id)

rhs = (Log
       .select(subq2.c.api_id, subq2.c.id, subq2.c.new_name.alias('name'))
       .from_(subq2)
       .group_by(subq2.c.api_id))

query = lhs.union_all(rhs).order_by(SQL('1, 2'))

The SQL that's generated is:

SELECT "t1"."api_id", "t1"."id", "t1"."new_name" AS "name" 
FROM (
    SELECT "t2"."id", "t2"."api_id", "t2"."timestamp", "t2"."new_name", "t2"."old_name", "t2"."checked" 
    FROM "log" AS "t2" 
    WHERE ("t2"."checked" = ?) 
    ORDER BY "t2"."id" DESC) AS "t1" 
GROUP BY "t1"."api_id" 
UNION ALL 
SELECT "t3"."api_id", "t3"."id", "t3"."new_name" AS "name"
FROM (
    SELECT "t4"."id", "t4"."api_id", "t4"."timestamp", "t4"."new_name", "t4"."old_name", "t4"."checked" 
    FROM "log" AS "t4" 
    WHERE ("t4"."checked" = ?) 
    ORDER BY "t4"."id"
) AS "t3" 
GROUP BY "t3"."api_id" 
ORDER BY 1, 2
megachweng commented 6 years ago

@coleifer I just want to select latest and soonest row of each group which is grouped by api_id,and are there effective way to implement it with peewee,appreciate your help

coleifer commented 6 years ago

Check out this post, which talks about various ways to approach the "top item per group" query: http://charlesleifer.com/blog/techniques-for-querying-lists-of-objects-and-determining-the-top-related-item/

It's a common thing and there are several ways to do it.

You might find it easier to do something like:

LA = Log.alias()
subq = (LA.select(
    LA.api,
    fn.MIN(LA.id).alias('min_id'), 
    fn.MAX(LA.id).alias('max_id'))
.group_by(LA.api)
.where(LA.checked == 0))

query = Log.select().join(subq, on=(
    (Log.api == subq.c.api_id) &
    ((Log.id == subq.c.min_id) | (Log.id == subq.c.max_id)))