Snowflake-Labs / django-snowflake

MIT License
61 stars 16 forks source link

Exists database function generates invalid Snowflake SQL #93

Closed stanorama closed 6 months ago

stanorama commented 7 months ago

Using Django 'Exists' generates invalid SQL:

qa = Notification.objects.filter( ~Exists( NotificationUser.objects.filter(notification_id=OuterRef("notification_id")) ), valid_from_date__lte=now, valid_to_date__gte=now, )

Generates the following SQL:

SELECT "PUBLIC"."NOTIFICATION"."NOTIFICATION_ID",       "PUBLIC"."NOTIFICATION"."MESSAGE",       "PUBLIC"."NOTIFICATION"."URL",       "PUBLIC"."NOTIFICATION"."URL_DESCRIPTION",       "PUBLIC"."NOTIFICATION"."VALID_FROM_DATE",       "PUBLIC"."NOTIFICATION"."VALID_TO_DATE",       "PUBLIC"."NOTIFICATION"."INSERT_TIMESTAMP",       "PUBLIC"."NOTIFICATION"."INSERT_USER",       "PUBLIC"."NOTIFICATION"."UPDATE_TIMESTAMP",       "PUBLIC"."NOTIFICATION"."UPDATE_USER"  FROM "PUBLIC"."NOTIFICATION" WHERE (NOT EXISTS(SELECT 1 AS "A" FROM "PUBLIC"."NOTIFICATION_USER" U0 WHERE U0."NOTIFICATION_ID" = ("PUBLIC"."NOTIFICATION"."NOTIFICATION_ID") LIMIT 1) AND "PUBLIC"."NOTIFICATION"."VALID_FROM_DATE" <= '2024-03-25 11:05:58.376722+00:00' AND "PUBLIC"."NOTIFICATION"."VALID_TO_DATE" >= '2024-03-25 11:05:58.376722+00:00') ORDER BY "PUBLIC"."NOTIFICATION"."VALID_FROM_DATE" ASC

The use of LIMIT in the sub-query is not valid in Snowflake:

SQL Error [2031] [42601]: SQL compilation error: Unsupported subquery type cannot be evaluated

timgraham commented 7 months ago

Are you suggesting that the query in question works on Snowflake without LIMIT 1 and thus django-snowflake should try to omit it?

I'll have to read more about Snowflake's subquery limitations.

stanorama commented 7 months ago

Yes, consider these 2 generic queries:

SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE COLUMNS.TABLE_NAME = TABLES.TABLE_NAME) LIMIT 1;

SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE COLUMNS.TABLE_NAME = TABLES.TABLE_NAME LIMIT 1) LIMIT 1;

The 2nd one will fail due to the LIMIT with the sub query.

Ideally the django-snowflake connector would remove the limit syntax from the generated query. Other databases like SQL Server would not support the LIMIT syntax either so I assume there is a way for the db specific connectors to influence this part of the query syntax.

timgraham commented 7 months ago

Snowflake's documentation states: "The only type of subquery that allows a LIMIT / FETCH clause is an uncorrelated scalar subquery. Also, because an uncorrelated scalar subquery returns only 1 row, the LIMIT clause has little or no practical value inside a subquery."

It sounds like it's safe to remove the LIMIT 1 that Django always adds to EXISTS queries.

PR: https://github.com/Snowflake-Labs/django-snowflake/pull/94