carbonfact / lea

🏃‍♀️ Minimalist alternative to dbt
Apache License 2.0
211 stars 7 forks source link

Splitting a query into CTEs #21 #45

Open MohamedBsh opened 3 months ago

MohamedBsh commented 3 months ago

Related to #21

Parsing SQL queries and extraction of CTEs within queries. Add --materialize-ctes option in the CLI to activate this feature. Materializing CTEs so we can analyze them quickly.
 I hope this helps you at carbonfact @MaxHalford 🤞🤞

pytest -s test_examples.py::test_jaffle_shop_materialize_ctes

Original query:                                                                                                                                                                                                                                                         
           with customer_orders as (                                                                                                                                                                                                                                                           

                   select                                                                                                                                                                                                                                                                      
                   customer_id,                                                                                                                                                                                                                                                                

                   min(order_date) as first_order,                                                                                                                                                                                                                                             
                   max(order_date) as most_recent_order,                                                                                                                                                                                                                                       
                   count(order_id) as number_of_orders                                                                                                                                                                                                                                         
               from jaffle_shop_ctes_max.staging.orders                                                                                                                                                                                                                                        

               group by customer_id                                                                                                                                                                                                                                                            

           ),                                                                                                                                                                                                                                                                                  

           customer_payments as (                                                                                                                                                                                                                                                              

               select                                                                                                                                                                                                                                                                          
                   orders.customer_id,                                                                                                                                                                                                                                                         
                   sum(amount) as total_amount                                                                                                                                                                                                                                                 

               from jaffle_shop_ctes_max.staging.payments                                                                                                                                                                                                                                      

               left join jaffle_shop_ctes_max.staging.orders orders using (order_id)                                                                                                                                                                                                           

               group by orders.customer_id                                                                                                                                                                                                                                                     

           )                                                                                                                                                                                                                                                                                   

           select                                                                                                                                                                                                                                                                              
               -- #UNIQUE                                                                                                                                                                                                                                                                      
               customers.customer_id,                                                                                                                                                                                                                                                          
               customers.first_name,                                                                                                                                                                                                                                                           
               customers.last_name,                                                                                                                                                                                                                                                            
               customer_orders.first_order,                                                                                                                                                                                                                                                    
               customer_orders.most_recent_order,                                                                                                                                                                                                                                              
               customer_orders.number_of_orders,                                                                                                                                                                                                                                               
               customer_payments.total_amount as customer_lifetime_value                                                                                                                                                                                                                       

           from jaffle_shop_ctes_max.staging.customers customers                                                                                                                                                                                                                               

           left join customer_orders                                                                                                                                                                                                                                                           
               on customers.customer_id = customer_orders.customer_id                                                                                                                                                                                                                          

           left join customer_payments                                                                                                                                                                                                                                                         
               on  customers.customer_id = customer_payments.customer_id                                                                                                                                                                                                                       
           Split query result:                                                                                                                                                                                                                                                     runner.py:76
           Number of CTEs found: 2                                                                                                                                                                                                                                                 runner.py:76
           CTE 'customer_orders': SELECT customer_id, MIN(order_date) AS first_order, MAX(order_date) AS most_recent_order, COUNT(orde...                                                                                                                                          runner.py:76
           CTE 'customer_payments': SELECT orders.customer_id, SUM(amount) AS total_amount FROM jaffle_shop_ctes_max.staging.payments LE...                                                                                                                                        runner.py:76
           Attempting to materialize CTE: core.customers__customer_orders                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
     ..
           Successfully materialized CTE: core.customers__customer_orders                                                                                                                                                                                                          runner.py:76
           Attempting to materialize CTE: core.customers__customer_payments                                                                                                                                                                                                        runner.py:76

.. 
           Successfully materialized CTE: core.customers__customer_payments                                                                                                                                                                                                        runner.py:76
…

All tables:
analytics.finance__kpis
analytics.kpis
core.customers
core.customers__customer_orders
core.customers__customer_payments
core.orders
core.orders__order_payments
staging.customers
staging.orders
staging.payments

CTE tables:
analytics.finance__kpis
core.customers__customer_orders
core.customers__customer_payments
core.orders__order_payments

core.customers__customer_orders data:
(1, '2018-01-01', '2018-02-10', 2)
(2, '2018-01-11', '2018-01-11', 1)
(3, '2018-01-02', '2018-03-11', 3)
(6, '2018-02-19', '2018-02-19', 1)
(7, '2018-01-14', '2018-01-14', 1)

core.customers__customer_payments data:
(1, 33.0)
(2, 23.0)
(3, 65.0)
(6, 8.0)
(7, 26.0)

core.customers data:
(1, 'Michael', 'P.', '2018-01-01', '2018-02-10', 2, 33.0)
(2, 'Shawn', 'M.', '2018-01-11', '2018-01-11', 1, 23.0)
(3, 'Kathleen', 'P.', '2018-01-02', '2018-03-11', 3, 65.0)
(6, 'Sarah', 'R.', '2018-02-19', '2018-02-19', 1, 8.0)
(7, 'Martin', 'M.', '2018-01-14', '2018-01-14', 1, 26.0)

All CTE materialization tests passed successfully.
PASSED