enso-org / enso

Hybrid visual and textual functional programming.
https://enso.org
Apache License 2.0
7.34k stars 323 forks source link

Support for `First`/`Last` aggregations in Snowflake backend #10411

Closed radeusgd closed 1 month ago

radeusgd commented 3 months ago

As part of my work on #9486 I did not manage to easily port our code to support the First and Last aggregations. So I'm creating this ticket to investigate it further.

The first step will be to revert disabling the tests and aggregators:

Subject: [PATCH] fixing COUNT DISTINCT and trying to fix FIRST
---
Index: distribution/lib/Standard/Snowflake/0.0.0-dev/src/Internal/Snowflake_Dialect.enso
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/distribution/lib/Standard/Snowflake/0.0.0-dev/src/Internal/Snowflake_Dialect.enso b/distribution/lib/Standard/Snowflake/0.0.0-dev/src/Internal/Snowflake_Dialect.enso
--- a/distribution/lib/Standard/Snowflake/0.0.0-dev/src/Internal/Snowflake_Dialect.enso (revision c03d9fbd541dcee482170df2370cec7031286267)
+++ b/distribution/lib/Standard/Snowflake/0.0.0-dev/src/Internal/Snowflake_Dialect.enso (date 1719836556251)
@@ -262,7 +262,7 @@
     date_ops = [make_extract_as_int "year", make_extract_as_int "quarter", make_extract_as_int "month", make_extract_as_int "week", make_extract_as_int "day", make_extract_as_int "hour", make_extract_as_int "minute", make_extract_fractional_as_int "second", make_extract_fractional_as_int "millisecond" modulus=1000, make_extract_fractional_as_int "microsecond" modulus=1000, ["date_add", make_date_add], ["date_diff", make_date_diff], ["date_trunc_to_day", make_date_trunc_to_day]]
     special_overrides = []
     other = [["RUNTIME_ERROR", make_runtime_error_op]]
-    my_mappings = text + counts + stats + first_last_aggregators + arith_extensions + bool + date_ops + special_overrides + other
+    my_mappings = text + counts + stats + arith_extensions + bool + date_ops + special_overrides + other
     Base_Generator.base_dialect . extend_with my_mappings

Index: test/Snowflake_Tests/src/Snowflake_Spec.enso
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/Snowflake_Tests/src/Snowflake_Spec.enso b/test/Snowflake_Tests/src/Snowflake_Spec.enso
--- a/test/Snowflake_Tests/src/Snowflake_Spec.enso  (revision c03d9fbd541dcee482170df2370cec7031286267)
+++ b/test/Snowflake_Tests/src/Snowflake_Spec.enso  (date 1719836426136)
@@ -568,7 +568,7 @@
     Common_Spec.add_specs suite_builder prefix create_connection_fn

     common_selection = Common_Table_Operations.Main.Test_Selection.Config supports_case_sensitive_columns=True order_by_unicode_normalization_by_default=True allows_mixed_type_comparisons=False fixed_length_text_columns=True removes_trailing_whitespace_casting_from_char_to_varchar=True supports_decimal_type=True supported_replace_params=supported_replace_params run_advanced_edge_case_tests_by_default=False
-    aggregate_selection = Common_Table_Operations.Aggregate_Spec.Test_Selection.Config first_last_row_order=False aggregation_problems=False
+    aggregate_selection = Common_Table_Operations.Aggregate_Spec.Test_Selection.Config first_last=False first_last_row_order=False aggregation_problems=False
     agg_in_memory_table = ((Project_Description.new enso_dev.Table_Tests).data / "data.csv") . read

     agg_table_fn = _->
radeusgd commented 3 months ago

Log of my attempts:

  1. The way we did this for Postgres, e.g.

    SELECT FIRST_VALUE(X) IGNORE NULLS OVER (ORDER BY Y ASC NULLS FIRST) AS "FOO" FROM TEST123 GROUP BY Z

    fails with

    [TEST123.X] is not a valid group by expression

    The thing is - FIRST_VALUE in Snowflake is only a Window function, it is not an aggregate function. So it cannot be used as an aggregate directly. It will work if I add X and Y to GROUP BY but that changes query semantics in a way we don't want here.

  2. Technically, the Window function for each group will yield the same result in that group, so I tried the ANY_VALUE trick:

    SELECT ANY_VALUE(FIRST_VALUE(X) IGNORE NULLS OVER (ORDER BY Y ASC NULLS FIRST)) AS "FOO" FROM TEST123 GROUP BY Z

    but that also fails with

    Window function [FIRST_VALUE(TEST123.X) OVER (ORDER BY TEST123.Y ASC NULLS FIRST)] may not appear inside an aggregate function.
enso-bot[bot] commented 1 month ago

Radosław Waśko reports a new STANDUP for the provided date (2024-08-16):

Progress: Initial work on First/Last in Snowflake - seeing what approaches we can take and how we can integrate it into existing IR. It should be finished by 2024-08-23.

Next Day: Next day I will be working on the #10411 task. Continue, as I did not find a solution yet.

enso-bot[bot] commented 1 month ago

Radosław Waśko reports a new STANDUP for yesterday (2024-08-19):

Progress: Refactoring aggregate logic to allow for more customization needed for First/Last workaround. It should be finished by 2024-08-23.

Next Day: Next day I will be working on the #10411 task. Continue

enso-bot[bot] commented 1 month ago

Radosław Waśko reports a new STANDUP for yesterday (2024-08-20):

Progress: Working on the custom handling for First/Last using row_number. Trying to figure out how to write it in a non-ugly way, but failing. It should be finished by 2024-08-23.

Next Day: Next day I will be working on the same task. Try to implement whatever that works and then see if I can make it a bit cleaner.

enso-bot[bot] commented 1 month ago

Pavel Marek reports a new STANDUP for today (2024-08-21):

Progress: Got it working for Snowflake. SQLite is actually more problematic than I thought, at least if we want to handle ignore_nothing=True. I'm considering abandoning it, or only implementing the ignore_nothing=False case It should be finished by 2024-08-23.

Next Day: Next day I will be working on the same task. Experiment with limited SQLite support. Try to clean up the code a bit. Check if more tests are needed

enso-bot[bot] commented 1 month ago

Radosław Waśko reports a new STANDUP for yesterday (2024-08-22):

Progress: Cleaned up the code. It should be finished by 2024-08-23.

Next Day: Next day I will be working on the same task. Try adding limited SQLite support. Adapt tests.

enso-bot[bot] commented 1 month ago

Radosław Waśko reports a new STANDUP for the provided date (2024-08-23):

Progress: Finishing the PR - added SQLite support for partial First/Last and adapted tests. Implemented #10848 refactor and put PR. It should be finished by 2024-08-23.

Next Day: Next day I will be working on the #9875 task. Next tasks.